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1185 Avenue Of The Americas 
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Assistant Commissioner for Patents 
Washington, D.C. 2 0231 
Box: Patent Application 



Sir : 



PRELIMINARY AMENDMENT TO THE ACCOMPANYING CONTINUATION 
APPLICATION FILED UNDER 37 C.F.R. SI ,53 

Applicants request that the following amendment be made in the 
above- identified application: 

In the Specification : 

On page 1, after the title and before line 3, please insert the 
following new sentence: 

--This application is a continuation of PCT International 
Application No. PCT/GB98/00542 , filed 20 February 1998, 
designating the United States of America, which is a 
claiming priority of British Patent Application No. 
9703681.8, filed February 21, 1997, the contents of which 
are hereby incorporated by reference into the present 
application. -- 



Wyatt Paul, et al. 
U.S. Serial No.: Not Yet Known 
(Continuation of PCT/GB98/00542 , 
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Filed: Herewith 
Page 2 

In the Claims: 

Please amend the claims as follows: 

In claim 3, delete "or claim 2"; 

In claim 4, delete "any one of claims 1 to 3" and insert 
--claim 1 - - ; 

In claim 5, delete "any one of claims 2 to 4" and insert 
--claim 2--; 

In claim 6, delete "any one of claims 1 to 5" and insert 
--claim 1--; 

In claim 8, delete "any one of claims 1 to 5" and insert 
--claim 1- - ; 

In claim 9, delete "any one of preceding claims" and insert 
--claim 1 - - ; 

In claim 10, delete "any one of claims 3 to 9" and insert 
--claim 3 - - ; 

In claim 11, delete "any one of claims 5 to 10" and insert 
--claim 5--; 

In claim 12, delete "any one of claims 1 to 11" and insert 
--claim 1 - - ; 

In claim 13, delete "any one of claims 1 to 12" and insert 
- -claim 1- - ; 

In claim 14, delete "any one of preceding claims" and insert 

--claim 1-- ; 
In claim 17, delete "in claim 16"; 

In claim 18, delete "any one of claims 15 to 17" and insert 
--claim 15--;" 

In claim 19, delete "any one of claims 15 to 18" and insert 
- -claim 15--; 

In claim 20, delete "any one of claims 15 to 19" and insert 
--claim 15--; 

In claim 22, delete "any one of claims 15 to 19" and insert 
--claim 15--; 
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In claim 23, delete "any one of claims 15 to 22" and insert 
--claim 15--; 

In claim 24, delete "any one of claims 15 to 23 M and insert 
--claim 15--; 

In claim 25, delete "any one of claims 15 to 24" and insert 
--claim 15--; 

In claim 26, delete "any one of claims 15 to 25" and insert 
--claim 15--; 

In claim 27, delete "any one of claims 15 to 26" and insert 
--claim 15--; 

In claim 28, delete "any one of claims 15 to 27" and insert 
--claim 15--; 

In claim 29, delete "any one of claims 1 to 14 or by a 
method as claimed in any one of claims 15 to 28" and insert 
--claim 1- - ; 



REMARKS 

This application is a continuation of PCT International 
Application No. PCT/GB98/00542 , filed 20 February 1998, 
designating the United States of America and claiming priority 
of British Patent Application No. 9703681.8, filed February 21, 
1997. 

By this Preliminary Amendment, applicants have hereinabove 
amended the specification on page 1 to insert the continuation 
data. Applicants have also amended claims 3-6, 8-14, 17-20, and 
22-29. Applicants maintain that the amendments made hereinabove 
do not raise any issue of new matter. Accordingly, applicants 
respectfully request entry of the Amendment. 
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If a telephone interview would be of assistance in advancing 
prosecution of the subject application, applicants' undersigned 
attorney invites the Examiner to telephone at the number provided 



No fee other than the filing fee of $940.00 is deemed necessary 
in connection with this Preliminary Amendment. However, if any 
other fee is required, authorization is hereby given to charge 
the amount of such fee to Deposit Account No. 03-3125. 



below. 
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New York, New York 10 03 6 
(212) 278-0400 
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®o ail mJjmn it mag nmrrrtt: 



fie it known that, we, 
Wyatt Paul, Pascual Perez, Eric Huttner, and Andreas Stefan Betzner 
have invented certain new and useful improvements in 

PROTEIN COMPLEMENTATION IN TRANSGENIC PLANTS 



of which the following is a full, clear and exact description. 



PROTEIN COMPLEMENTATION IN TRANSGENIC PLANTS 



This invention relates to pairs of parent plants for 
producing hybrid seeds ^nd to methods for producing 
plants with a desired phenotype. The desired phenotype 
is an active enzyme, a regulatory protein or a protein 
which affects the functionality and/or viability and/or 
structural integrity of a ceil. Preferably, the desired 
phenotype is substantially absent from the parent 
plants/lines. In particular, the invention relates to 
parent plants and methods involving plant lines for 
producing male-sterile plants and seeds. 

The present invention describes a protein 
complementation system, with a variety of different 
applications. The system can be explained and 

exemplified with reference to obtaining male-sterile 
plants and erabryoless seeds although it is not limited 
to these applications . 

The use of dominant Artificial Male Sterility (AMS) in 
plants is described in WO95/20668. This document 
describes a binary system using two genes which together 
(but not in isolation) cause male sterility. The genes 
are brought together by crossing plants, each parent 
being homozygous for the gene, which generates a 
homogenous population_ of male sterile plants. 
WO95/20668 describes -several ways to implement the gene 
binary system, including the following: 

i. a system based on activcition of transcription: a 
transcriptionally inactive AMS gene is activated 
upon crossing by provision of the relevant 
transcription factor; 



ii . a system based on activation of splicing: an AMS 
gene inactivated by the presence of an intron is 
activated upon crossing by provision of the 
relevant maturase;^ 

iii . a system based on the suppression of a stop codon 
during translation: an AMS gene inactivated by 
introducing an artificial stop codon in the ORF, is 
activated upon crossing by provision . of an 
artificial stop suppressor tRNA for the introduced 
scop codon. 



iv. a system based on sequence -specific gene 
inactivation: One parent contains a modified male 
fertility gene and a transgene which inactivates 
only the unmodified male fertility gene. The other 
parent contains a transgene which inactivates cnly 
the modified male fertility gene. In the hybrid 
both the modified and unmodified male fertility 
genes are inactivated causing male sterility. 

a system based on preventing restoration of male 
fertility by a restorer gene: the first parent 
contains the AMS gene and the restorer gene, and 
the second parent contains a gene inhibiting the 
action of the restorer gene. 

However, the binary systems described above have so far 
proved complex to implement and have encountered a 
variety of difficulties. 



For example, it has been found that the use of a 
suppressor tRNA (described in Betzner et al. 1996, 



Abstract of the 14th International Congress of Plant 
Reproduction, Lome, Australia) can have deleterious 
consequences for some plant sp€>cies. While this dees 
not preclude its use, tt does make the screening of 
suitable transgenic plants more labour intensive than 
desirable. Another example is the leakiness of the T7 
promoter (described in EP-A- 0589841) . Some plants 
transformed with a T7 promoter driving Barnase were 
sterile in the absence of the T7 RNA polymerase. Again, 
this does not preclude use of the system but it does 
make it difficult to identify suitable transgenic 
plants. Furthermore, in certain plants the gene binary 
system is sub-optimal since not all of the required 
genetic elements are fully characterised. 

Two areas of prior art have been explored which have 
resulted in a phenotype conferred to a plant by the 
combination of two proteins. 

In 1989, Hiatt and coworkers (Nature, vol. 342, p. 76- 
78) described the production of a functional antibody in 
tobacco by crossing tobacco plants expressing a gamma 
immunoglobulin gene and a kappa immunoglobulin gene. 

Problems were, however, encountered with this system. 
Since the light and heavy chains of an antibody interact 
through disulfide bridges, the bridges were unable to 
form in the reducing environment present in the 
cytoplasm. Assembly of a functional antibody in plants 
thus requires that both chains are targeted to the 
endoplasmic reticulum then secreted to the apoplast (the 
space between cells) . The production of antibodies in 
plants has thus been limited to the production of 
secreted antibodies or the production of single chain 



antibodies . 



In 1992 Lloyd et al . (Science, vol. 258, p. 1773-1775) 
described the transfer in Arabidopsis and tobacco of two 
maize genes coding for "the transcription factors R and 
CI. Ectopic expression of these genes separately in 
heterologous plants has some effect on the transcription 
of endogenous genes . In particular the genes have some 
effect in isolation, and this may preclude their use for 
applied purposes. Co-expression of the two genes had 
more dramatic qualitative and quantitative effects, than 
expression of either gene alone. However, these genes 
have properties severely limiting their usefulness and 
their general inapplication is described in the paper. 

It has been shown that the Arabidopsis transcription 
factors Apetala3 and Pistillata can be ectopically co- 
expressed, and* jointly in concert cause a new phenotype 
in the Arabidopsis flower (Krizek and Meyerowitz, 1996, 
Development, vol. 122, p. 11-22), The limitations 
described above for the R/Cl proteins also apply in this 
case . 

The present invention describes a protein 
complementation system which overcomes many of the 
problems and difficulties associated with known gene 
binary systems. The protein complementation system 
according to this invention is based on the expression 
of two or more gene sequences in a single plant, which 
polypeptides /proteins, associate, interact or come 
together to form an active enzyme, a regulatory protein 
or a protein which affects the functionality and/or 
viability and/or the structural integrity of a cell. 
Hereinafter, in this text all references to a protein 



which affects the structural integrity of a cell also 
describes a protein which may, in addition, or 
alternatively, affect the functionality and/or viability 
of a cell. Some polypeptides/proteins may fall in more 
than one of these categories None of the individual 
gene sequences present in a given plant lead to a 
significant phenotypic effect in these plants. 

The present invention describes the creation of a plant 
which has a desired phenotype through expression of an 
active enzyme, regulatory protein or protein which 
affects the structural integrity of a cell (eg. a 
membrane destabilising protein) . The plant may be 
obtained by crossing a pair of parent plants a and b. 
Plant a contains one or more gene sequences which encode 
a polypeptide (s) or protein(s) (A) with little or no 
activity so that the desired phenotype is not 
significantly (or substantially) caused by expression of 
the one or more genes in plant a alone. Plant b also 
contains one or more gene sequences which encode a 
polypeptide (s) or protein (s) (B) also, with little or no 
activity so that the desired phenotype is not 
significantly (or substantially) caused by expression of 
the one or more genes in plant b alone. When plants a 
and b are crossed, the resulting hybrid expresses both 
polypeptides and/or proteins A and B . These two 
polypeptides/proteins _ associate, interact or come 
together to form an active enzyme, regulatory protein or 
protein which affects the structural integrity of the 
cell, with the result that the daughter plant displays 
the desired phenotype. NB: From hereon, when discussing 
the polypeptides/proteins A or B they will be referred 
to only as 'polypeptides 1 for the sake of convenience. 
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This protein complementation binary system is simpler 
than the previously described binary systems since there 
is no need for interaction between genes, no -required 
modification of the expression of genes and no 
5 modification of the level of expressed polypeptides in 
the daughter plant compared to the parent plants . 

The present invention is described with reference to the 
Figures which are: 

10 

FIGURE 1A; Barnase coding sequence; 
FIGURE IB; Incergenic sequence ; 
FIGURE 1C; Barstar coding sequence; 

FIGURE ID; Translational fusion of ORF Peptide A**/ (Gly4 
15 ser) 3 Linker peptide / GUS; 

FIGURE IE; Nucleotide sequence of Translational fusion 
of Ubiquitin genomic sequence and ORF Peptide A***; 
FIGURE IF; Nucleotide sequence of Translational fusion 
of Ubiquitin genomic sequence and ORF peptide B*** 

2 0 FIGURE 1G; DNA sequence of IPCR (inverse polymerase 

chain reaction) primers (example 1) 
FIGURE 2; Schematic illustration of pepA* 
and pepB* construction by Inverse PCR (IPCR) 
FIGURE 3A; In vitro construction from synthetic 
25 oligonucleotides of S-peptide, S (+5) -protein and S- 
protein; 

FIGURE 3B; In vitro construction from synthetic 
oligonucleotides of trhe sequence encoding the S -peptide 
and the (Gly4-Ser)3 linker; 

3 0 FIGURE 4A; protein and DNA sequences of S-peptide and S- 

peptide with (Gly4-Ser) 3 linker; 
FIGURE 4B; protein and DNA sequences of S (+5) -protein 
and S -protein. 

FIGURE 4C(i); PCR amplification product encoding partial 
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A0X3 targeting signal ; 

(ii) ; ORF encoding A0X3 targeting sequence 
(underlined) and S -peptide 

(iii) ; ORF encoding A0X3 targeting sequence 
5 (underlined) and S -peptide/ (Gly4 Ser)3/GUS 

(iv) ; ORF encoding A0X3 targeting sequence 
(underlined) and S -protein 

(v) ; translationai fusion of Ubiquitin genomic 
sequence and ORF of S -protein; 

10 FIGURE 4D; nucleotide sequence of IPCR primers (example 
3) 

FIGURE 5; production scheme for embryoless maize grains. 

Embroyoless seeds harvested from female rows only = 100% 
15 of embryoless maize seeds 

or 

Seeds harvested ' from all the field plants = 
20 approximately 80% of embryoless maize seeds: 

note that if this sort of seeds harvesting is suited a 
random sowing with 10% of male plants and 90% of female 
plants is desirable and possible. 

25 

Legend 

male parent A 
expressing pepA* in embryos 
3 0 Genotype : emb-pepA* / emb-pepA* 

or 

emb-pepA* linked to Herbicide 
resistance/emb-pepA* linked to 
herbicide resistance 



- 7/1 - 



female parent B 
expressing pepB* in embryos only- 
Genotype: emb-pepVemb-pepB* in a 
male sterile cytoplasmic environment 

or 

emb - pepB * / emb - pepB * 
Artificial Male Sterility linked to 
Herbicide Resistance/+ 

According to a first aspect of the invention there is 
provided a pair of parent plants for producing seeds 
comprising : 

(i) a first parent plant containing one or more 
gene sequences encoding a polypeptide A; and 

(ii) a second parent plant containing one or more 
gene sequences encoding a polypeptide B; 

wherein the polypeptides A, B, when expressed separately 
in different plants, do not form an active enzyme a 
regulatory protein or other protein which affects the 
structural integrity of the cell but when expressed in 
the same plant do 'form an active enzyme, regulatory 
protein or other protein which affects the structural 
integrity of the cell. Presence of the active enzyme, 
regulatory protein or protein which affects the 
structural integrity of the cell in a single plant, is 
the desired phenotype. 

The present invention includes the scenario of inter- 
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extra-genic repression/complementation/ suppression; that 
is, where a mucation in one subunit of a multi-subunit 
complex can complement a mutation in another sub-unit in 
order to restore the active enzyme, regulatory protein 
5 or protein affecting the structural integrity of the 
cell. In such a scenario, the polypeptide (s) /protein (s) 
A and B may be the same in the two parent plants, with 
the exception of the different mutations. Examples 
include the E.coli regulatory proteins as described by 
10 Tokishita S.I., and Mizuno T\ , 1994, Mol . Microbiol. 
(UK), 13/2, 435-444 and the GroES and GroEL proteins of 
E.coli as described by Zeilstra-Ryalls J . , et al . , 1994, 
J\ Bacterid. (US), 176, (21), 6558-65. 

15 In the present invention, the pair of parent plants can 
be described as a pair of complementary plants for 
producing hybrid seeds or even a pair of complementary 
transgenic plants for producing transgenic hybrid seeds. 

20 It is most likely that at least one of the pair of 
parent plants is transgenic. When used herein the term 
1 transgenic 1 refers not only to genetic material from 
* another species but to genetically manipulated DNA from 
the same plant or species. The genetic manipulation of 

25 the plant may be by a microbiological process such as 
Agrrobactsrium tumefaciens (Horsch R.B., Fry J.E., 
Hoffman N.L. , Eichholtz D., Rogers S.G., Fraley R.T., 
(1985), Science, 227 : 1229-1231)). Alternative 
manipulations include biolistic transformation, a 

3 0 technique also well known in the art, the use of 
Agrrobac Cerium rhizogenes , particle gun, electroporation 
polyethylene glycol or silica fibers. 

The present invention may be applied to any plant, in 



particular, maize, wheat, tomato, oilseed rape, barley, 
sunflower , 1 inseed , peas , beans , melon , pepper , squash , 
cucumber and egg plant (aubergine) and other broad acre 
plants*. — 

Use of the term "one or more gene sequences encoding a 
polypeptide...." refers to any number of stretches of 
genetic material (preferably DNA) which can encode one 
or more peptides/polypeptides/proteins . Thus 
"polypeptides" A or B can actually comprise more than 
one amino acid sequence which may or may not be linked 
or associated. There is no restriction on the location 
in the parent plant genome of the one or more gene 
sequences . Where more than one gene sequence is 
present, encoding for more than one 

peptide/polypeptide/protein, the relationship between 
the encoded sequences in each parent plant is only 
relevant to the extent that the parent plant does not 
display the - desired phenotype (to any significant 
level) . When the one or more gene sequences encoding a 
polypeptide A are expressed in the same plant as the one 
or more gene sequences encoding polypeptide B, then the 
result, according to the invention is the phenotype of 
an active enzyme, a regulatory protein or a protein 
which affects the structural integrity of a cell. 
Proteins which affect the structural integrity of a cell 
include proteins that ^destabilise or create holes or ion 
channels in cellular membranes. 

A particular application of the present invention is the 
production of male-sterile plants. Accordingly, the 
polypeptides A, B when expressed in the same plant may 
cause male-sterility by ablation of the tapetum. An 
alternative application, also of the first aspect of the 



invention is the expression of polypeptides A, B in the 
same plant which form an active enzyme, a regulatory- 
protein or protein which affects the structural 
integrity of a cell, which, through cell ablation in a 
specific tissue results, in a different phenotype, as 
described below. 

In addition to causing male-sterility, potent hydrolases 
like Barnase can be used for other applications where 
cell ablation is needed, for example to remove an 
unneeded organ from a hybrid crop. This may contribute 
to reducing downstream processing costs. One example is 
the production of embryoless seeds, which is now 
described as follows: In the production of flour (from 
wheat) or semolinas (from maize or wheat) or corn flakes 
(from maize) or for other uses, it would be desirable to 
have seeds with no embryo. The use of embryo specific 
promoters in the first aspect of the invention above 
would enable ablation of embryos in seeds, in a cross 
dependent manner. That is, in the seeds produced by the 
plant -containing one or more gene sequences encoding 
polypeptide A, pollinated with pollen from a plant 
containing one ■ or more gene - sequences - encoding 
polypeptide B. Self pollination of -plant a has to be 
prevented, for example by making plaint a male-sterile. 
A possible production scheme for valuable embryoless 
maize grains would be_ the following: generate a plant 
containing one or more gene sequences encoding 
polypeptide A (plant a) and a plant containing one or 
more gene sequences encoding polypeptide B (plant b) , 
designed so that combination of polypeptide A and 
polypeptide B in one seed results 'in embryo ablation. 
Figure 5 shows a production scheme for embryoless maize 
grains according to the invention. 



The biochemical composition of plants can also be 
manipulated according to the first aspect of the 
invention, for example, by fatty acid biosynthetic 
enzymes. Where the presence of an unusual but valuable 
fatty acid in the plant has a deleterious effect on the 
plant, it would be useful to be able to produce seeds 
with the unusual (fatty acid) oil through a cross 
between two lines having a normal (or quasi normal) oil 
composition (to the extent that each parent line is not 
deleteriously effected) . Splitting the enzyme 

responsible for the valuable fatty acid biosynthesis in 
two or more inactive parts, provides a practical way of 
producing the seeds with the valuable oil. Where the 
enzyme responsible for the desired trait is 
heteromultimeric, separating the genes from the. various 
monomers in the two parent plants is a simple way to 
implement the invention. More generally, this invention 
can be used to obtain hybrid seeds or hybrid plants with 
a particular phenotype which neither parent has. In 
particular, this invention can be used to create hybrid 
plants, resistant to a herbicide, via the crossing of 
• two parent plants; Each of the parent plants expresses 
one or more non- functional parts of an active enzyme, 
regulatory protein or -protein which " effects the 
structural integrity of a cell, which is directly or 
indirectly responsible^ for herbicide resistance. As the 
one or more genes in- each parent plant responsible for 
the trait will segregate independently, this will result 
in the gametes of such hybrid plants (especially pollen 
grains) giving rise to a lower transfer of the herbicide 
resistance trait to relatives or to weeds (in comparison 
with a classical single gene) . If the hybrid seed is 

the harvested desirable product, expression of the 
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desired trait would be restricted to the seed endosperm 
and embryo since these tissues are genetically hybrids. 

The active enzyme, regulatory protein or protein which 
affects the structural "integrity of a cell is preferably 
localised to a tissue specific (ie. present only in a 
selected tissue) . This requires that one or both of the 
gene sequences encoding the polypeptides A, E are 
operatively linked to an appropriately stimulated 
promoter, eg. a tissue specific promoter so as to 
produce the desired phenotype . Where only one of the 
polypeptides is limited to expression in a selected 
tissue, the other polypeptide requires constitutive 
expression or at least an expression pattern which 
overlaps with that of the first polypeptide. 

As described above, the expression may be seed or embryo 
specific and promoters for such tissue specificity are 
well known in the art. In the case of male-sterility, 
the promoter is preferably tape turn specific. Such 
promoters known in the art include the TA29 promoter 
(EP-A-0344029) , the A9 promoter {Paul et al 199-2, Plant 
Molecular Biology, vol. 19, p. 611-622) and the 
promoters described in W095/29247. In order for 

heterozygous plants to have the desired phenotype, 
promoters must be active at the sporophytic level . 

The choice of gene sequence for producing an active 
enzyme, regulatory protein or protein which affects the 
structural integrity of a cell depends, of course, on 
the desired phenotype. Any gene sequence encoding an 
active enzyme, regulatory protein or protein which 
affects the structural integrity of a cell can be used 
provided that the protein activity can result from the 
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association, interaction or combination of two or more 
polypeptides encoded by two or more gene sequences and 
that their activity can result in the desired phenotype. 

Immediately obvious proteins which can be suitable are 
those which are naturally encoded by two or more 
polypeptides and which self -assemble to form the final 
protein structure. The individual polypeptide units 
(subunits) should have no significant activity in vivo. 

Suitable proteins for use according to the invention 
include natural heterodimeric proteins such as the Cl-R 
maize proteins and the Apetala3 -Pistillata (Ap3-Pi) 
Arabidopsis thaliana proteins. When present in the 
tapetum, the dimer protein Ap3-Pi can activate genes 
responsive to this transcription factor (which would 
normally be inactive because this transcription factor 
is normally absent from, or present at a low level in, 
the tapetum) . , The activated gene is preferably, but not 
necessarily, endogenous to the plant of interest. For 
example, expressing the dimer Ap3-Pi .in the tapetum of 
maize will activate transcription of genes normally 
involved in flower development in other floral organs, 
and will prevent normal pollen maturation. The level of 
sterility of such a system can be improved by also 
engineering into the daughter plant a gene sequence 
which is affected by the produced - active enzyme or 
regulatory protein. _ 

One example is the introduction into one of the parent 
lines of a gene sequence from Barnase or PR-Glucanase 
under the control of the Apetala3 promoter (pApetala3) . 
The Apetala3 promoter is responsive to the Ap3-Pi dimer 
and thus expression of the Barnase or PR-Glucanase 
protein occurs in the daughter plant. Such a system 
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provides for the enhancement of plant male-sterility 
with the additional advantage of being under a strict 
control mechanism (via the pApetala3) . Thus, the cause 
of the desired phenotype may be direct, ie . a direct 
result of the active . enzyme, regulation protein or 
protein which affects the structural integrity of a 
cell, or may be indirect, ie. acting via an intermediate 
factor. Other transcription factors, for use in the 
invention, exist already as, or can be engineered to, a 
heterodimeric form, for example using the dimerisation 
domains described below. These include artificial 
transcription factors made by the association of a DNA 
binding domain and an activation domain of different 
origins . 

An alternative use of the Apetala3 -Pistillata system, is 
the complementation of mutations in sub-units of the 
proteins. For example, one parent plant n\ay < express 
both proteins but with a mutation in one or the other so 
that the plant does not have the active dimer. The 
other parent plant may also express both proteins, in 
this case, a mutation being in the other protein. The 
second parent plant would not express the active dimer. 
■ A cross between the two parent plants would result in 
expression of genes to produce an active dimer. 

Ectopic expression __of the subunits for these 
transcription factors -can be used to modulate expression 
of their target gene and cause male sterility or other 
traits (including pleiotropic effects) in a cross - 
dependent manner. 

It is also possible to use, according to the first 
aspect of the invention proteins which have to be 
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"art if icially" split into two or more nucleic acid 
coding sequences. The resulting polypeptides/proteins 
must associate, assemble, interact or come together 
when expressed in the _ same plant to form an active 
5 enzyme, regulatory protein or protein which affects the 
structural integrity of a cell. Such artificial 
splitting of enzymes and proteins is today easily 
achieved by predicting where the protein can be split 
into two or more domains, for example predicting by 

10 structural biochemistry such as X-ray crystallography, 
functional protein analysis in mutants, structure 
prediction from sequence analysis or by limited 
proteolysis, amongst other techniques. In this way, the 
random coil or other suitable regions are identified as 

15 places where the protein may be split . 

Examples of artificially split proteins include: 

Bamase: This protein has been widely used to cause cell 
20 ablation, when expressed in specific tissues. Under the 
control of a tapetum specific promoter, expression* of a 
Bamase gene causes male -sterility -in many plant species 
(EP-A-0344029) . It is known that the Bamase protein 
can be split into two polypeptides, which per se have no 
25 catalytic activity [in vitro] . When put together the 
two polypeptides can self -assemble to produce an enzyme 
whose product has RNase activity. (Sancho and Fersht, 
1992, J.Mol.Bioi., 224, 741-747). 

3 0 RNase A can also be used. It was shown, as long ago as 
1959 (Richards and Vithayathil, J.Biol .Chem. , 234, 1459- 
14 55) that RNase A can similarly be split by mild 
proteolytic treatment into two polypeptides which can 
then reassociate and produce an active enzyme. 



In order to implement a system, according to the presenc 
invention, involving artificially split proteins, it may 
be necessary to design genetic constructs in order to 
express the polypeptides therefrom. In order to design 
the genetic constructs whose products will associate to 
form the active enzyme some modifications may be 
required. For example, a methionine codon can be added 
in front of the ORF encoding the second half of the 
active enzyme and a stop codon can be added after the 
ORF encoding the first half of the active enzyme. If 
the polypeptides are expressed as the C terminal part of 
a translational fusion to another protein or to a 
protein targeting sequence, then a start codon may be 
absent from the ORF of polypeptide A and/or polypeptide 
B, whereas a stop codon is still needed to terminate the 
ORF of polypeptide A and polypeptide B, respectively. 
If polypeptide. A or B is expressed as the N- terminal 
part of a translational fusion to another protein, then 
the ORF of polypeptide A or B will start with a 
methionine codon whereas the termination codon is 
provided by the ORF of the other protein to which it is 
fused. Such genetic construct design is commonplace 
and well known to the person skilled in the art. 

The invention may also be practised by expressing two 
portions of two different enzymes that together give a 
different activity than either of the intact parent 
proteins . 

Preferably, both parent plants are homozygous with 
respect to the gene sequences encoding polypeptide A or 
polypeptide B. Such genotypes ensure that all offspring 
will express the active enzyme, regulatory protein or 
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protein which affects the structural integrity of the 
cell. 

If one* or more of the polypeptides (A or B) is /are small 
and there are doubts that any of them will be stable in 
a cell, it is possible to use well-known systems wherein 
the small polypeptide is fused in frame to a "carrier 
protein" which protects it from being degraded or 
increases its proteolytic stability, but retains its 
freedom to interact with the other polypeptide (s) to 
form the active enzyme, regulatory protein or protein 
which affects the structural integrity of a cell. 

The carrier protein can be chosen so that the 
polypeptides A or B are not affected by the fusion. One 

suitable carrier protein is the (3-Glucuronidase (GUS) 
protein, which tolerates addition to its NH 2 end, and is 
a good reporter gene in plants. In this case, one can 
use the level of GUS activity to evaluate the expression 
level of the fused small polypeptide. This, can be 
useful for screening primary transf ormants . Another 
suitable carrier protein is ubiquitin (Hondred and 
Vierstra, 1992, Curr. Opin. "Biotechnol. 3, 147-151; 
Vierstra, 1996, Pant Mol. Biol., 32, 275-302). When 
fused in frame to the carboxy- terminus of ubiquitin, 
proteins accumulate . significantly in the plant 
cytoplasm. In addition artificial ubiquitin protein 
fusions resemble natural ubiquitin extension proteins, 
e.g. UBQ1 of Arabidopsis thallana. (Callis et al., 1990, 
J. Biol. Chem. , 265, 12486-12493), in that they are 
cleaved precisely at the C- terminus of ubiquitin (after 
Gly 76 by specific endogenous proteases. This process 
releases the u attached" protein or peptide moiety from 
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the fusion protein and thus permits polypeptide A and B 
to assemble into a functional enzyme or protein. Also, 
for the purpose of protecting small proteins from 
cytoplasmic proteolysis^ translational products may be 
enlarged by fusing them to protein targeting signals, 
e.g. the C- terminus (Whelan and Glaser, 1997, Plant Mol . 
Biol. 33, 771-781) and be directed to specific locations 
in the cell such as to mitochondria. A suitable signal, 
for example, is the one found in the A0X3 protein of 
soybean (Finnegan and Day, Plant Physiol., 1997, 114, pp 
155) which would add 50 amino a^cids to polypeptide A and 
B, respectively. Import associated proteolytic 
processing will remove the targeting signal by cleavage 
after MetSO thereby releasing the free polypeptides A 
and B into the mitochondria where they combine to 
disrupt mitochondrial function and thus to compromise 
cell viability. 

In some cases, when expressed in two or more portions, 
the polypeptides may not spontaneously associate, 
assemble, interact or come together in vivo to reform an 
active protein, or regulatory enzyme or protein which 
affects the .* structural integrity of a cell. In other 
cases the association of the polypeptides may be weak so 
that little active reconstituted protein is formed. To 
circumvent these problems, each protein portion may be 
linked to a protein dimerisation domain, thus enabling 
the portions to be- brought together in vivo. Such 
protein dimerisation domains are found in many proteins 
that naturally form dimers or multimers and the linking 
technique is well known in the art. 

For example, the human cysteine-rich protein LXM double 
zinc finger motif has been fused to the Gal4 and VP16 
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proteins. In. contrast to the unmodified Gal4 and Vpl6 
proteins the LIM-Gal4 and LIM-VP16 associate in vitro 
and in vivo (in NIH 3T3 mammalian cells) forming an 
active transcription factor (Feuerstein et al., 1994, 
Proc. Natl. Acad. Sci . U.S.A. 91, 10655-10659) . The LIM 
motif is found in many orgcinisms. For example, a 
sunflower pollen specific protein with a LIM domain has 
been identified (Batlz et al., 1996, Plant Physiology 
(Supplement III, 59) . Other protein dimerisation 

domains exist such as the leucine zipper (Turner, R. and 
Tijian R. , 1989 Science, 243, 1689-1694), the helix- 
loop-helix (Murre et al., 1989, Cell, 56, 777-783), the 
ankyrin Blank et ai., 1993, Trends in Biochemical 
Sciences, 17, 135-140) and the PAS (Huang et al., 1993, 
15 Nature, 364, 259-262) domains. 

One may also wish to ensure that the genes encoding 
polypeptides A or 3 are inserted in the genomes of 
parents a and b at an identical position (or at tightly 
linked positions) so that their chance of co- segregation 
in the transgenic hybrid is low. This can be 

advantageous, for example in the production of hybrid 
seed since the two genes that are used to create the 
male-sterile parent plant will subsequently segregate. 
25 Thus, Fl hybrid progeny are 100% male fertile • since no 
hybrid plant can inherit both components of the male- 
sterility system. 

The gene sequences carried by the parent plants a and b 
30 which encode part of the active enzyme, regulatory 
protein or protein which affects the structural 
integrity of a cell may be from a different organism. 
The gene sequences do not have to be plant derived and 
include genes from microbial or other sources. For 
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example, the gene sequences may be Arabidopsis 
endogenous sequences in maize or tomato parent plants. 
Also, the gene sequences may be those which, in 
combination with a tissue specific promoter, are 
expressed in a tissue'" in which the gene sequences are 
not normally expressed. 

According to a second aspect of the invention there is 
provided a method for producing a plant having a desired 
phenotype of an active enzyme, a regulatory protein or a 
protein which affects the structural integrity of a 
cell, the method comprising crossing a first plant line 
with a second plant line wherein the first line contains 
one or more gene sequences encoding a polypeptide A 
15 which is part of an active enzyme, regulatory protein or 
protein which affects the structural integrity of a cell 
but which line does not have the phenotype and wherein 
the second line contains one or more gene sequences 
encoding a polypeptide B which is complementary to the 
polypeptide or protein A but which line does not have 
the desired phenotype. Here, the term "complementary" 
means that when expressed in the same . plant the 
.polypeptides A and B associate, interact or come 
together to form the phenotype of an active enzyme, a 
25 regulatory protein or protein which affects the 
structural integrity of a cell.* 

Such a method may ~ incorporate one or more of the 
features described above for the first aspect of the 
3 0 invention and the invention contemplates the application 
of these aspects according to the second aspect of the 
invention. 



20 



According to a third aspect of the invention there is 
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provided a seed or plant obtainable from a pair of 
plants according to the first aspect of the invention or 
by a method according to the second aspect of the 
invention . — 

5 

According to a fourth aspect of the invention there is 
provided a seed or plant having a phenotype of an active 
enzyme, regulatory protein or protein which affects the 
structural integrity of the cell, which is caused by the 

10 combined action of two or more transgenes, the 
transgenes not being present on the same copy of a 
chromosome. The preferred embodiments of the first, 
second and third aspects of the invention also apply to 
the fourth aspect. This means that the two or more 

15 transgenes are either on different chromosomes, or on 
different copies of the same chromosome, ie. the plant 
is made by a cross . 

The invention will now be described by the following 
20 non-limiting Examples: 

EXAMPLE l 

Splitting the Barnase Gene into Two Components (Figure 

25 11 

The results of Sancho and Ferscht, 1992, J.Mol.Biol., 
224, 741-747 show that Barnase activity can be -obtained 
by combining a peptide A containing amino acids 1 to 36 
30 of the mature Barnase protein and peptide B containing 
amino acids 37 to 110 of the mature Barnase protein. 
The allele of Barnase which is described in Sancho and 
Ferscht is a mutant which has a methionine at position 
36, allowing cyanogen bromide to cleave between 3 5 and 



37 and produce the 2 peptides. The following genetic 
constructs, to express the peptides, were prepared: 

Peptide A: — 

i. A Bamase gene with a methionine codon (amino acid 
position -1) added before codon 1 of the mature Bamase 
sequence so that translation can take place as described 
in Paul et al, 1992, Plant Mol.Biol., 19, 611-622. 

ii. An ORF coding for a peptide called A* , containing a 
methionine followed by amino acids 1 to 3 5 of mature 
Barnase protein followed by an Ochre stop codon. 

iii. A gene made of ORF A* under control of the A9 
promoter by using IPCR on our plasmid p3 079, which 
contains the AMS gene pA9 -Barnase (as in i. above) - 
Barstar - CaMV 3 r region. (See Figure 2). 

Plasmid p3079 was constructed by cloning a fragment 
containing the ORFs for Barnase -Barstar , obtained by PGR 
from pWP127-(Paul et al, 1992, supra), in our plasmid 
pl415, which is a derivative of pWP91 (WO- A- 921137 9) 
where the EcoRV restriction site has been converted to 
Hindlll. IPCR was then performed on p3079 using primers 
B3 and B4 (see Figures. 1 and 2) designed so that the 
sequence between codon 3 6 of Barnase and stop codon of 
Barstar is not part of the amplified product. The IPCR 
amplified sequence was then circularised by ligation and 
the resulting plasmid was introduced into E. coli The 
plasmid was then prepared, cut with EcoRI and the 
fragment containing the ORF A* was cloned in the EcoRI 
sites of p!415, so that ORF A* would be under the 



control of the A9 promoter from a sequence not treated 
by PCR. The resulting plasmid p2 022 contains ORF A* in 
the A9 expression cassette. 

iv. An ORF coding for "a peptide called A**, comprised 
of a start methionine codon followed by amino acids 1 to 
36 of the mutant Barnase (Sancho and Ferscht, 1992, 
supra) but lacking a stop codon. 

This was obtained by PCR on template p2022 with primers 
B5 (retaining the Xbal site at the 5' end) and B6 
generating a blunt 3' end. 

v. A gene made of the translational fusion of ORF A** 
and the ORF of (Gly4 SerS/GUS under the control of the 
A9 promoter, the product of which shows peptide A fused 
in frame to the N- terminus of (Gly4 Ser)3/GUS (Figure 
ID) . 

This was obtained by replacing the S -peptide ORF in 
plasmid p2028 (see example 3) with the ORF of plasmid 
A** (iv) . For ORF replacement an IPCR was performed on 
.plasmid p2028 using primers B7 (retaining the Xba site 
at the 5' end) and B8 (generating a blunt 3' end) to 
delete the region encoding the S -peptide from the S- 
peptide-GUS translational fusion. After digest with 
Xbal, the PCR fragment, encoding peptide A** (iv) was 
inserted Xbal /blunt into the acceptor DNA generated by 
IPCR. 

vi. An ORF coding for peptide A***, essentially 
identical to peptide A**(iv) but lacking a methionine 
start codon and containing an amber stop codon. 
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This was obtained by PCR on template p2022 using primers 
B9 (producing a blunt 5' end) and BIO (introducing a 
BamHI site at the 3' end). The 3' end of the PCR 
product was digested wUth BamHI for construction of the 
5 ubiquitin-petide A*** translational fusion (below) . 

vii. A gene made of the trans IcLtional fusion of genomic 
DNA encoding ubiquitin and the ORF A*** under the 
control of the A9 promoter, the product of which shows 
10 peptide A*** fused in frame to the C terminus of 
ubiquitin (Figure IE) . 

The genomic DNA encoding ubiquitin was obtained by PCR 
from chromosomal DNA of Arabidopsls tha.lia.na. . The PCR 
15 primers UbqlSF and UbqlR were designed to amplify the 
ubiquitin encoding sequence from the extension protein 
gene UBQ1, first described by Callis et al . (1990, 
supra) , Restriction sites for Xbal (at 5' end) and 
BamHI (at 3' end), introduced during thermocy cling, were 

2 0 used to clone the PCR fragment under the control of the 

A9 promoter of p!415 digested with Xbal and BamHI to 
yield plasmid p3245. IPCR was then performed on p3245 
using primers UBQla and UBQlb to generate a blunt 
acceptor end immediately after the ubiquitin codon Gly 
25 76 and at the 3' end to reconstitute the BamHI 
restriction site for cloning. ' After BamHI digest this 
construct served as ^ acceptor for the PCR fragment 
encoding peptide A***- (vi) . 

3 0 Peptide B: 

i. An ORF coding for a peptide called B* which starts 
with a methionine codon followed by codons 37 to 110 of 
the mature Barnase sequence. In effect this transfers 
the methionine 36 of the mutant: Barnase gene (Sancho and 
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Ferscht, 1992, supra) from peptide A to peptide B, 
yielding peptides A* and B* . 

ii. Gene for ORF B* containing the ATG (amino acid 

position -1) of Barnase (in p3079) fused to codon 37 of 

Barnase, under control of the A9 promoter, by deleting 

(by IPCR with suitable primers) (see below)) codons 1 to 
36 of the mature Barnase sequence. 



This was done by performing on p3 079 an IPCR reaction 
using primers Bl and B2, (Figures 1 and 2) designed so 
that the sequence between codon 2 and codon 3 6 of 
Barnase is not part of the amplified product (see Figure 
2) . The IPCR product is treated as described above for 
ORF A*, and cloned under control of the A9 promoter in 
P1415. The resulting plasmid p2023 contains ORF B* - 
Barstar in the A9 expression cassette* 

iii. An ORF encoding peptide B*** which differs from B* 
(i) in that it lacks the start methionine, 

iv. A gene made of the translational fusion of genomic 
DNA encoding ubiquitin and the ORF B*** under the 
control ' of the A9 promoter, the product of which shows 
peptide B*** fused in frame to the C-terminus of 
ubiquitin (Figure IF) . 

IPCR as performed on~ plasmid p2023 (above) with primers 
Bll and B12, retaining the Xbal site at the 5'. end of B* 
but removing the ATG start and leaving a blunt 3' end. 
After digest with Xbal, the IPCR product served as an 
acceptor for the ubiquitin encoding DNA, The latter 
sequence was obtained by PCR from plasmid p3245 (above) 
with primers Ubql6F and Ubqlb retaining an Xbal site at 
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the 5' end while leaving the 3" end blunt. After digest 
with Xbal, the IPCR and the PGR product were ligated to 
yield the translational fusion shown. 

5 In Figure 1G: The nucleotide sequences of primers are 
listed which were used for PGR and IPCR, respectively. 

In Fig. 2: Circular plasmid p3079, containing the A9- 
driven barnase/barstar gene (Figure 1) in p!415, served 

10 as template for Inverse PCR. As the PGR primers (Figure 
1) pointed into opposite directions, the IPCR yielded a 
linear double- stranded plasmid DNA from which the region 
in between the 5 1 ends of the annealed PCR. primers was 
deleted (below) . Intramolecular ligation would then 

15 yield circular deletion plasmids which were introduced 
into E.coli for further subcloning. 

Also In Fig. 2- : 
lane 1: 

20 A schematic (not to scale) representation is shown of 
plasmid p3079. The different structural parts of the 
coding regions are highlighted. ATG and TAA represent 
the start and stop codon of barnase and barstar, 
respectively. The relative positions of codons 35, 36 

25 and 37 of the mature Barnase protein are indicated. 

lane 2 : 

IPCR with primers Bi and B2 deleted codons 1 to 36 of 
the mature Barnase protein. Intramolecular ligation of 
3 0 the linear deletion plasmid then fused the ATG start 
codon to codon 3 7 yielding the pepB* /barstar region. 

lane 3 : 

IPCR with primers B3 and B4 deleted the sequence 
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downstream of the barnase codon 3 5 as indicated. 
Intramolecular ligation of the linear deletion plasmid 
then fused the barnase codon 35 to the bars tar stop 
codon yielding the pepA* ^sequence . 
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EXAMPLE 2 

Plant Transformation — with the Genetic Constructs in 



Genes pA9-A* and pA9-B* expressing a polypeptide A and a 
polypeptide B from the A9 promoter (W092/113 79) were 
cloned into derivatives of the plant transformation 
vector pBin!9 Beven et al., 1984, Nucl . Ac. Res. 12, 

10 8711 Frish et al., 1995 Plant Mol . Biol., 27', 405-409 
and Arabidopsis plants containing pA9 -polypeptide A, or 
pA9 -polypeptide B , or both genes, were obtained. Plants 
concaining both genes were mal€i sterile, whereas plants 
containing one gene were unaffected by the transgene. 

15 Plants with one gene were allowed to self, their progeny 
was harvested, and was analysed to identify homozygous 
and heterozygous Tl plants. Tl plants with pA9- 
polypeptide A , were crossed with Tl plants with pA9- 
polypeptide B. The hybrid seeds obtained displayed the 

20 predicted phenotype: wild type if containing one gene 
only, and male sterile when containing the two genes. 

Genes are introduced into - maize and into tomato by 

biolistic or Agrobacterium-mediated transformation, and 

25 plants are regenerated and assessed for male fertility 

in a similar way. (Mornish et al., 1990 Biol /Technology 

8, 833-839 and FillatjT et al 1987 Bio/Technology 5, 
726-7390. 

30 EXAMPLE 2 

Splitting an RNAseA gene Into two components 
(Ficruyfts 2 and 4) 
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From the work of Richards and Vithayathil (195 9 suora) , 
we know that the enzyme RNAseA can be cleaved (by the 
protease subtiiisin) to generate two polypeptides: the 
S -peptide contains amino acids 1 to 2 0 of RNaseA, and 
the S-protein contains'" amino acids 21 to 124 of RNaseA. 

When combined, the S -peptide and the S-protein 
associate, and reconstitute an active enzyme. The last 
5 amino acids of the S -peptide are not needed for 
reconstituting RNaseA: a smaller S -peptide made of amino 
acids 1 to 15 is sufficient. Genes which express the 
S -peptide and the S-protein under control of the A9 
promoter were used to develop a system according to the 
invention. 

The starring material was a synthetic gene coding for 
bovine pancreatic RNAseA (Vasantha and Filpula, 1989, 
Gene 76 53-60) . A gene coding for the OR? of RNaseA was 
made using synthetic oligonucleotides (see Figures 3 A 
and 3B) . The nucleotide sequence of the gene was 
designed to be compatible with maize codon usage, 
according to Fennoy and Bailey-Serres, 1993 Nuc. Acids 
Res., 21, 5294-5300. PCR with suitable primers was 

used to amplify from the full length ORF. The following 
ORfewere built: 

S -pep tide: 

i. An ORF for the -peptide containing a methionine 
translation initiation codon followed by codons . 1 to 15 
of the mature RNaseA sequence (see Figures 4A and 4B) 
and terminated by an Ochre stop codon* 

ii . An ORF made of a methionine translation initiation 
codon followed by codons 1 to 15 of the mature RNAseA 
sequence, followed by a linker sequence encoding (Gly4- 
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Ser) 3 (see Figures 4A and 4B) . This gene was designed 
so that it can be fused in frame to the ORF of the GUS 
protein by cloning in the BamHI site of plasmid p2027 
which • contains the GUS— gene from pBI101.3 (Jefferson, 
5 1987 Plant Mol .Biol .Reporter, 5 387-405) . 

iii. A translational fusion comprising the ORF of the 
mitochondrial protein targeting sequence of A0X3 protein 
from soybean (Finnegan and Day, 1997, Plant Physiol. 

10 114, pp455) and the ORF of S-pept ide as described in (i) 
but lacking the methionine translation initiation codon 
(Figure 4C) . The gene product of said translational 
fusion shows the S -peptide fused to the C- terminal end 
of the targeting sequence. 

15 

iv. A translational fusion comprising the ORF of the 
mitochondrial protein targeting sequence of AOX3 protein 
(supra) and the ORF of the S-peptide-GUS fusion as 
described in (ii) but lacking the methionine 

20 translational initiation codon (Figure 4C) . The gene 
product 'of said translational fusion shows that the S- 
peptide-GUS protein fused to the C- terminal end of the 
targeting sequence. 

25 S -protein: 

i. An ORF for the "S -protein +5", which contains a 
methionine translation initiation codon followed by 
codons 16 to 124 of mature RNAseA sequence and is 
terminated by an Ochre codon, 

30 

ii. An ORF for the S -protein which contains a 
methionine translation initiation codon followed by 
codons 21 to 124 of mature RNAseA sequence and is 
terminated by an Ochre codon. 
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iii. A translational fusion comprising the ORF of the 
mitochondrial protein targeting sequence of A0X3 protein 
(supra J and the ORF of- the S -protein as described in 
5 (ii) but lacking the mathionine translational 
intitiation codon (Figure 4C) . The gene product of said 
translational fusion shows the S -protein fused to the C- 
terminal end of ubiquitin. 

10 iv. A translational fusion comprising genomic DNA 
encoding ubiquitin and the ORF of the S -protein as 
described in (ii) but lacking the methionine 
translational intitiation codon (Figure 4C) . The gene 
product of said translational fusion shows the S -protein 

15 fused to the C- terminus of ubicruitin. 

Genes under control of the A9 promoter were then built 
and introduced .into plants as in Example 2. 

20 In Fig. 3A: The sequences encoding the S -peptide, the 
S (+5) -protein and the S-protein were constructed by- 
first aligning sense oligonucleotides -RN-I to RN-VTI 
lanes 2, 5, 7, 9, 11, 13, 16) along complementary guide 
oligonucleotides HN-1 to RN-6 (lanes 3, 6, 8, 10, 12, 

25 14) and then selectively ligating the correctly aligned 
sense oligonucleotides using Taq-DNA-Ligase. 

The ligation resulted in a continuous single DNA strand 
sense) which was subsequently amplified by Vent DNA 
3 0 polymerase (25 PCR cycles) using one of two primer pairs 
as follows: (i) Primers RN-a (lane 1) and RN-b (lane 15) 
amplified the full ligation product. The PCR product 
was gel purified and cleaved with restriction enzymes 
BamHI (underlined, lanes 1 and 15) and Bglll underlined, 



lanes 2 and 4} to yield two DNA fragments encoding the 
S-peptide and the S(+5) protein. The two fragments were 
cloned separately into the BamHI site downstream of the 
pA9 promoter in plasmid- pl415 to yield plasmids p4837 
(S-peptide) and p4838 ~~{S+5 protein) . (ii) Primers RN-d 
(lane 4) and RN-b (lane 155) amplified the coding 
sequence of the S -protein. The PCR product was cloned 
as described in (i) to yield plasmid .p483 9 (S-protein) . 



lane 1: PCR primer (sense) RN-a 

lane 2: Oligonucleotide RN-I and alignment to 

oligonucleotide RN-I1 
lane 3: Guide oligonucleotide RN-1 (antisense) 
lane 4: PCR primer (sense) RN-d 

lane 5: Oligonucleotide RN-II (continued from lane 2) 

and alignment to oligonucleotide RN-I I IN 
lane 6: Guide oligonucleotide RN-2N (antisense) 
lane 7: oligonucleotide RN-I UN (continued from lane 

5) and alignment to oligonucleotide RN-IV 
lane 8: Guide oligonucleotide RN-3 (antisense) 
lane 9: oligonucleotide RN-IV (continued from lane 7) 

and alignment to oligonucleotide RN-V 
lane 10: Guide oligonucleotide Rn-4 (antisense) 
lane 11; oligonucleotide RN-V (continued from lane 9) 

and alignment to oligonucleotide RN-VI 
lane 12: Guide oligonucleotide RN-5 (antisense) 
lane 13 : oligonucleotide RN-VI (continued from lane 11 

and alignment to oligonucleotide RN-VII 
lane 14: Guide oligonucleotide Rn-6 (antisense) 
lane 15: PCR primer (antisense) RN-b 

lane 16: oligonucleotide RN-VII (continued from lane 
13) 
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Symbols : 

(5 1 ): non-phosphorylated 5 1 end 
(5P) : phosphorylated 5 r end 
(3 OH) : * convene ional 3 ' end 

(small letters) : bases added for the convenience of 
cloning. 



In Fig. 3B: The sequences encoding the S -peptide with 
the (Gly 4 Ser) 3-linker peptide were constructed by first 
10 aligning sense oligonucleotides RN-I and RN-VIII (lanes 
2 and 4) along the complementory guide oligonucleotide 
RN-7, and then selectively ligating the correctly 
aligned oligonucleotides using Taq-DNA-Ligase . 

15 The ligation resulted in a continuous single DNA strand 
which was subsequently amplified by Vent DNA polymerase 
(25 PCR cycles) using the primer pair RN-a (lane 1) and 
RN-c (lane 5). This PCR reaction yielded the full 
length, double stranded ligation product. The PCR 

20 product was gel purified, then cleaved with restriction 
enzymes BamHI (underlined, lame 1) and Bglll 
(underlined, lane 5) and cloned into the BamHI site of 
p2027 to generate an NH 2 -terminal protein fusion to GU3 
under the control of the pA9 promoter (p2027 was 

25 constructed by cloning the GUS coding sequence of 
pBHOl.3 as a BamHI/Smal fragmeint into the BamHI site of 
p!415) . This yielded plasmid p2028. 

lane 1: PCR primer (sense) RN-a 
3 0 lane 2: Oligonucleotide RN-I encoding the S-peptide as 
in Figure 3a and the alignment to 
oligonucleotide RN-VIII encoding the (Gly4- 
Ser) 3 .linker peptide 
lane 3 : Guide oligonucleotide (antisense) RN-7 



lane 4: Oligonucleotide RN-VIII (continued from lane 
2) 

lane 5: PCR primer (antisense) RJST-c 
Symbols : 

(5 ' ) : non-phosphorylated 5 ' end 
(5P) : phosphorylated 5 1 end 
(30H) : conventional 3 1 end 

(small letters) : bases added for the convenience of 
cloning 

In Figure 4A: The protein and DNA sequence is shown for 
S-peptide and the S-peptide with (Gly4 Ser) 3 linker. 
The S-peptide linker sequence was fused in frame to GUS 
to yield plasmid p2028 as described for Figure 3B. 

In Figure 4B: The ORF for (S+5) -protein and S-protein is 
shown as contained in plasmids p4838 and p4839, 
respectively. These plasmids were described above for 
Figue 3A. 

In Figure 4C: 

(i) The mitochondrial protein targeting sequence (short 
of the last four amino acids: Leu-Arg-Arg-Met) was 
obtained by PCR with primers AOX3MI1 and AOX3MI2 from a 
plasmid which contained the cUNA of Alternative Oxidase 
(A0X3) of soybean as published by Finnegan and Day, 1997 
(Plant Physiol. 114; pp455) . Restriction sites (Xbal 
and Bglll at the 5 'end and Aflll and BamHI at the 3' 
end) were introduced during the thermocycling to yield 
the PCR product which was cloned Xbal /BamHI downstream 
of the A9 promoter in p!415 . This plasmid was called 
p0200 . 
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(ii) Primers SPEPMI1 and SPEPMI2 were then used to 
produce from plasmid p4837 a PCR fragment encoding 
within and downnstream of an Aflll restriction site the 
missing four amino acids (Leu-Arg-Arg-Met ) of the 

5 mitochondrial targeting" signal followed by the ORF of S- 
peptide. A PCR generated BamHI site at the 3' end 
allowed cloning of the PCR fragment as an Aflll/BamHI 
fragment into p0200. This cloning yielded plasmid 
p0203, containing the complete ORF of the translational 
10 fusion as shown. 

(iii) The translational fusion of mitochondial targeting 
sequence and ORF of S-peptide-GUS was generated in a 
similar fashion as described in (ii) except that PCR 

15 primers SPEPMI1 and SPEPMI2 were used on template p2028 
to generate an Aflll/BamHI fragment that was cloned into 
p0200 to yield p0204. 

(iv) The translational fusion of mitochondrial targeting 
20 sequence and ORF of S -protein was generated in a similar 

fashion as desccribed in (ii and iii) , except that PCR 
primers SPROTMI1 and SPR0TMI2 were used ontemplate p4838 
to generate an Af III /BamHI fragment that was cloned into 
P0200 to yield p0202. 

25 

(v) A PCR fragment was generated from template p4839 
with primers SPROTF andSPROTR containing the ORF of S- 
protein in between BamHI restriction sites at either 
end. After digestion with BamHI this PCR fragment was 

30 cloned into the BamHI site of p3245 which yielded the 
translational fusion in p3249 of genomic ubiquitin DNA 
and S -protein as shown. 
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EXAMPLE 4 

Use Of the Dimer Protein A ,petala3-Pistillatra 
Apetala3 (Ape3) and Pistillata (Pi) are two proteins of 
Arabidopsis thaliana which are involved in the 
regulation of floral differentiation. The genes are 
known while the endogenous pattern of expression in the 
tape turn is not known. Expression of the Arabidopsis 
genes in the maize tapetum leads to disruption of the 
normal anther development by activating normally silent: 
genes. These genes can also be used to activate, in the 
maize tapetum, an Arabidopsis promoter responsive to the 
Ap3-Pi dimer such as the Ap3 promoter (pAp3) itself. 

We have built the following genes: 

pA9-Apetaia3 

The cDNA for Ap3 (Jack et a2, 1992, Cell 68, 683-697 
GenBank Accession No. M86357) was cloned in the A9 
expression cassette of pWP91 (WO-A-9211379) giving 
plasmid p4796. This plasmid contains the Ap3 cDNA with 
approximately 15 bases of 5 1 untranslated sequence 
followed by the whole ORF (698 bases from ATG to TAA) 
followed by approximately 120 bases of 3 1 untranslated 
sequence, cloned in the BamEI site of pWP91. 

pA9- Pistillata 

The cDNA for Pi (Goto and Meyerowitz, 1994, Genes Dev. 
8, 1548-1560 GenBank Accession No. D30807) was cloned in 
the A9 expression cassette of pWP91 (WO-A-9211379) 
giving plasmid p0180. This plasmid contains the Pi cDNA 
with approximately 24 bases of " 5 ' untranslated sequence 
followed by the whole ORF (626 bases from ATG to TGA) 
followed by approximately 250 bases of 3 ' untranslated 
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sequence, cloned in the Xbal-BamHI sites of pWP9i. 
pApetala3 -PRGlucanase 

The A9" promoter sequence in plasmid A9PR (described in 
5 Worrall et al, 1992, The Plant Cell, 4, 759-771) was 
replaced by a 1250 bp (approx) sequence containing the 
Ap3 promoter region, obtained by PCR amplification of 
Arabidopsis thaliana genomic DNA, according to the 
published sequence (Jack et al, 1994 Cell, 76, 703-716), 
10 giving plasmid p4817. 

The genes were introduced in maize in various 
combinations, by biolistic transformation techniques 
known in the art . Plants were regenerated and assessed 
15 for male fertility. 

-p4796 (pA9-Ap3) /p0180 (pA9-Pi) cause male sterility. 
Neither of them alone causes male sterility. 
-p4796/p0180/p4817 (pAp3 -PRGlucanase) cause sterility, 
20 when p4817 with only one of the two transcription factor 
genes does not . 
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CLAIMS 

1. A pair of parent plants for producing seeds 
compr is ing : ~~ 

(i) a first parent plant containing one or more 
gene sequences encoding a polypeptide or protein A: 
and 

(ii) a second parent plant containing one or more 
gene sequences encoding a polypeptide or protein B; 

wherein the polypeptides A, B, when expressed in 
separate plants, do not form an active enzyme, a 
regulatory protein or protein which affects the 
functionality and/or viability and/or the structural 
integrity of a cell , but when expressed in the same 
plant do form an active enzyme, regulatory protein, or 
protein which affects the structural integrity of a 
cell. 

2 . A pair of plants as claimed in claim 1 wherein the 
one or more gene sequences from at least one of the 
-parents is transgenic. 

3 . A pair of plants as claimed in claim 1 or claim 2 
wherein the polypeptides or proteins A, B, when 
expressed in the same plant, cause cell ablation, 
especially male-sterility or ernbryoless seeds. 

4 . A pair of plants as claimed in any one of claims 1 
to 3 wherein one of the parent plants is male-sterile. 

5 . A pair of plants as claimed in any one of claims 2 



to 4 wherein the one or more gene sequences encoding 
both or one of the polypeptides or proteins A, B, is 
operatively linked to a tissue specific promoter. 

6. A pair of plants as claimed in any one of claims 1 
to 5 wherein the polypeptides A, B are naturally 
occurring subunits of the protein complex of an active 
enzyme, regulatory protein, or protein which affects the 
structural integrity of a cell. 

7 . A pair of plants as claimed in claim 6 wherein the 
polypeptides A, B are two polypeptide subunits of an 
enzyme having RNase activity such as the enzyme Barnase 
or RNase A or the monomers of the protein complex of the 
Apelata3 -pistillata . 

8 . A pair of plants as claimed in any one of claims 1 
to 5 wherein the polypeptides A, B are artificially 
split polypeptides of an active enzyme, regulatory 
protein or protein which affects the structural 
integrity of a cell. 

9. A pair of plants as claimed in any one -of the 
preceding claims wherein each parent plant is homozygous 
with respect to the one or more gene sequences encoding 
polypeptide A or B respectively. 

10 . A pair of plants as claimed in any one of claims 3 
to 9 wherein the cause of male-sterility is direct or 
indirect . 

11. A pair of plants as claimed in any one of claims 5 
to 10 wherein the tissue-specific promoter is a tapetum- 
specific promoter, an embryo- specific promoter or a seed 
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specific promoter. 

12. A pair of plants as claimed in any one of claims 1 
to 11* wherein one or_ both of the polypeptides or 

5 proteins is fused to a - carrier protein and/or a protein 
targeting signal. 

13 . A pair of plants as claimed in any one of claims 1 
to 12 wherein each polypeptide or protein A, B is linked 

10 to a protein dimerisation domain of a dimeric or 
mul timer ic protein sequence that promotes association of 
between subunits A and B. 

14. A pair of plants as claimed in any one of the 
15 preceding claims wherein the one or more gene sequences 

from at least one of the parent plants is a heterologous 
gene sequence. 



20 



15. A method for producing a plant having a desired 
phenotype by virtue of an active enzyme, a regulatory 
protein or a protein which affects the structural 
integrity of a cell, the method comprising crossing a 
first line with a second line wherein the first line 
contains one or more gene sequences encoding a 
25 polypeptide or protein but which line does not have the 
desired phenotype and wherein the second line contains 
one or more gene sequences encoding a polypeptide or 
protein B which is complementary to the polypeptide or 
protein A but which line does not have the desired 
phenotype . 



30 



IS. A method as claimed in claim 15 wherein the one or 
more gene sequences from at lease one of the lines is 
transgenic. 
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17. A method as claimed in claim 15 or claim 16 wherein 
desired phenotype is cell ablation especially male- 
sterility or embryoless ^seeds . 

18 . A method as claimed in any one of claims 15 to 17 
wherein one of the lines is male-sterile. 

19. A method as claimed in any one of claims 15 to 18 
10 wherein the one or more gene sequences encoding 

polypeptides or protein A and/or B is operatively linked 
to a tissue-specific promoter. 

20. A method as claimed in any one of claims 15 to 19 
15 wherein the polypeptides or proteins A, B are naturally 

occurring subunits of an active enzyme, regulatory 
protein or protein which affects the structural 
integrity of a cell. 

20 21. A method as claimed in claim 20 wherein the 
polypeptides or proteins A, B are two polypeptide 
subunits of an enzyme having RNase activity such as the 
. enzyme Bamase, RNase A or the subunits of the protein 
Apelata3 -pistillata . 

25 

22. A method as claimed in any one of claims 15 to 19 
wherein the polypeptides or proteins A, B are 
artificially split polypeptides of an active enzyme, 
regulatory protein or protein which affects the 

3 0 structural integrity of a cell. 

23 . A method as claimed in any one of claims 15 to 22 
wherein each line is homozygous with respect to the gene 
sequence encoding polypeptide or protein A, B, 
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respectively. 

24 . A method as claimed in any one of claims 15 to 23 
wherein the desired phenotypic trait is direct or 

5 indirect: male-sterility, 

25. A method as claimed in any one of the claims 15 to 
24 wherein the tissue-specific promoter is a tapetum- 
specific promoter, an embryo-specific promoter or a seed 

10 specific promoter. 

26. A method as claimed in any one of claims 15 to 25 
wherein one or both of the polypeptides or proteins A, B 
is fused to a carrier protein and/or a protein targeting 

15 signal. 

27. A method as claimed in any one of claims 15 zo 26 
wherein each polypeptide or protein A, B is linked to a 
different protein dimerisation domain of a dimeric or 

20 multimeric protein. 

28. A method as claimed in any one of claims 15 to 27 
wherein at least one of the lines contains, as the one 
or more gene sequences, heterologous gene sequences. 

25 

29. A seed or plant obtainable from a pair of plants as 
claimed in any one of_ "claims 1 to 14 or by a method as 
claimed in any one of. claims 15 to 28. 

3 0 30. A seed or plant, having a phenotype of an active 
enzyme, regulatory protein or protein which affects the 
integrity of a cell, which is caused by the combined 
action of two or more transgenes, not present on the 
same copy of a chromosome. 
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PROTEIN COMPLEMENTATION IN TRANSGENIC PLANTS 
Abstract of the Disclosure 

The invention relates to pairs of parent plants for 
5 producing hybrid seeds and to methods for producing plants 
with a desired phenotype . The desired phenotype is an 
active enzyme, a regulatory protein or a protein which 
affects the functionality and/or viability and/or 
structural integrity of a cell. Preferably, the desired 
10 phenotype is substantially absent from the parent 
plants/lines. In particular, the invention relates to 
parent plants and methods involving plant lines for 
producing male-sterile plants and seeds. 
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FIG. AC 10/12 

L PGR amplification product encoding impartial AOX3 targeting signal 

Xbal / BgLII 

tctagatcttaac A7GAAGAATG TTTTAGTAAG GTCAGCTGCG CGAGCTCTGC TTGGCGGCGG 
TGGGCGGAGC TACTACCGCC AGCTCTCAAC GGCGGCGATC GTGGAACAGA 
GACACCAGCA CGGTGGCGGC GCGTTTGGAA GCTTCCA ctitaagcggatcc 

Aflll / BamHI 



ii. ORF encoding AOX3 targeting sequence (underlined) and S- peptide 

ATflAAflRATn TTTTAfrrAAff CTCAHCTncn rmnrrcTGC TTTOTCKiraff TfignttrfflgC 
TftCTftgggCC ftgCTCTCfAC ggcggcgatc gTgSAACflgft gACACCftgCft CggTgggggC 
nrrcrrTTGGAA gcttccactt aagaaggatg aaggagaccg ccgccgccaa gttcgagcgc 

CAGCACATGG ACAGCTAA 



iii. ORF encoding AOX3 targeting sequence (underlined) 
and S-peptide-(Gly4 Ser)3-GUS 

ATGAAGAATC TTTTAGTAAG GTCAGCTGCG CGAGCTCTGC TTgggggggg TggggggftgC 

TftCTftccGCc ftfinrrrcMC sgcssqsatc ffrfifflftragft mcftccftgcA mffrcriTOg; 

GCCTTTGGAA SCTTCCACTT. ftAgftAggfrTG AAGGAGACCG CCGCCGCCAA GTTCGAGCGC 
CAGCACATGG ACAGCGGCGG TGGCGGTTCC GGTGGCGGTG GCAGCGGCGG CGGTGGTAGC 
GGGATCCCCG GGTACGGTCA GTCCCTTATG — > GUS 



iv. ORF encoding AOX3 targeting sequence (underlined) 
and S-protein 

ftTgftftgPATg TTTTAGTAAG GTCAGCTGCG CgftggTCTgC TTgggggggg Tggggggftgg 

TftCTftmcm ftrTrrrTrPAc: smnrafflTr: mr&hWAQK mramMO Orffrmraax; 

gggTTTggftft gCTTCCACTT ftAgftftggftTg AGCTCCTCCA ACTACTGCAA CCAGATGATG 
AAGTCTAGGA ACCTGACCAA GGACAGGTGC AAGCCAGTCA ACACCTCCGT CCACGAGAGC 
CTGGCCGATG TCCAGGCCGT CTGCAGCCAG AAGAACGTGG CCTGCAAGAA CGGTCAGACC 
AACTGCTACC AGTCCTACAG CACCATGTCC ATCACCGACT GCCGCGAGAC CGGCTCCAGC 
AAGTACCCTA ACTGCGCCTA CAAGACCACA CAGGCCAACA AGCACATCAT TGTTGCCTGC 
GAGGGTAACC CTTACGTGCC TGTCCACTTC GACGCCTCCG TCTAA 



v, Translational fusion of Ubiquitin genomic sequence and ORF of S-protein 

ATGCAGATCT TCGTGAAAAC CTTGACCGGC AAGACCATCA CTCTCGAGGT CGAGAGCAGC 
GACACCATCG ACAATGTCAA GGCCAAGATC CAAGACAAAG AA GGTATCAT TCTTCCTCAC 
TCAATCTGHA TTCTTCTCTT TAGl "!"I'I"H'G AAATTCAGAT CTCTTATCAT TTftCTTGTHI 
CTCCTTTAAG GAATCCCTCC GGATCAGCAG AGATTGATCT TCGCCGGAAA GCAGCTCGAA 
GATGGCCGTA CTTTGGCTGA CTACAACATC CAG3AAG22A CftAARTCATC CGAATCCTTC 
TCTTOATPAT TTfTiATflATC TGATTGTATA AACTCTAATG CATTGTTATC ftTTTgTftftAC. 
ASAATCTACA CTTCATCTTG TGTTGAGGCT TAGAGGtGGa tcCagCTCCA ACTACTGCAA 
CCAGATGATG AAGTCTAGGA ACCTGACCAA GGACAGGTGC AAGCCAGTCA ACACCTCCGT 
CCACGAGAGC CTGGCCGATG TCCAGGCCGT CTGCAGCCAG AAGAACGTGG CCTGCAAGAA 
CGGTCAGACC AACTGCTACC AGTCCTACAG CACCATGTCC ATCACCGACT GCCGCGAGAC 
CGGCTCCAGC AAGTACCCTA ACTGCGCCTA CAAGACCACA CAGGCCAACA AGCACATCAT 
TGTTGCCTGC GAGGGTAACC CTTACGTGCC TGTCCACTTC GACGCCTCCG TCTAA 

Underlined: introns A and B wixhin the ubiquitin encoding sequence 
Bold: codon for Glycine 76 , marking the C-terminus of the afaiqiririn 

Small letters: PCR introduced conservative codon changes to generate a BamHI site 
and to modify the codon usage 
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FIG. UD 

Nucleotide sequence of T PCR primers (example 3) 

Sprot F 5 ' GGTGGATCCAQCTCCAACTACTGCAAC 3 ' 

SprotR 5' CGGGATCCTTAGACGGAGGCGTCG 3' 

SprotMIl 5' GTCCTTAAGAAGGATGAGCTCCTCCAACTAC 3* 
SprotME2 5' CGGGATCCTTAGACGGAGGCGTCG 3' 

SpepMIl 5' GTCCTTAAGAAGGATGAAGGAGACCGCCG 3' 
SpepM12 5* TCGGGATCCTTAGCTGTCCATGTGCTG 3' 
SpcpGMI2 5' TCGGGATCCTCATTGTTTGCCTCCCTG 3' 



AOX3MI1 
AOX3MI2 



5 ' TGCTCTAGATCTTAACATGAAGAATGTTTTAG 3' 
5' TCGGATCCGCTTAAGTGGAAGCTTCCAAAC 3' 
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FIGURE SHOWING A PRODUCTION SCHEME OF EMBRYO LESS MAIZE GRAINS: 
LINES A AND B ARE SOWN IN ALTERNATIVE ROWS (FOR EXAMPLE ONE MALE 

AND FOUR FEMALES) 






Mr 




LEGEND 
(REFER TO 
DESCRIPTION 
FOR DETAILS) P%J 



r\0 



MALE PARENT A 



FEMALE PARENT B 



FIG. 5 



SEQUENCE LISTING 



GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Gene Shears Pty. Limited 

(B) STREET: Suite 1, Building 5, 105 Delhi Road 

(C) CITY: North Ryde 

(D) STATE: North Ryde 

(E) COUNTRY: Australia 

(F) POSTAL CODE (ZIP) : NSW 2113 

(A) NAME: PAUL, Wyatt 

(B) STREET: c/o Nickerson Biocem Ltd, Cambridge Science 

Park, Milton Rd 

(C) CITY: Cambridge 

(D) STATE: Cambridge 

(E) COUNTRY: UK 

(F) POSTAL CODE (ZIP): CB4 5GZ 

(A) NAME: PEREZ, Pas caul 

(B) STREET: c/o Biogemma, Campus Universitaire des 

Cezeaux 

(C) CITY: 24 Avenue des Landais 

(D) STATE: Aubiere 

(E) COUNTRY: France 

(F) POSTAL CODE (ZIP) : 63170 

(A) NAME: HUTTNER, Eric 

(B) STREET: c/o Groupe Limagrain Pacific Pty Ltd r GPO Box 

475 

(C) CITY: Canberra 

(D) STATE: Canberra, ACT 

(E) COUNTRY: Australia 

(F) POSTAL CODE (ZIP) : 2601 

(A) NAME: BETZNER, Andreas Stefan 

(B) STREET: Groupe Limagrain Pacific Pty Ltd, GPO Box 475 

(C) CITY: Camber ra 

(D) STATE: Camber ra, ACT 

(E) COUNTRY: Australia 

(F) POSTAL CODE (ZIP): 2601 

(ii) TITLE OF INVENTION: Protein Complementation 
(iii) NUMBER OF SEQUENCES: 68 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 

(v) CURRENT APPLICATION DATA: 

APPLICATION NUMBER: WO PCT/GB98/00542 



(2) INFORMATION FOR SEQ ID NO: 1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 344 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

{A) NAME/ KEY: CDS 
(B) LOCATION: 9 . .344 

(D) OTHER INFORMATION :/product= "Barnase" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

TCTAGACC ATG GCA CAG GTT ATC AAC ACG TTT GAC GGG GTT GCG GAT TAT 50 
Met Ala Gin Val lie Asn Thr Phe Asp Gly Val Ala Asp Tyr 
15 10 

CTT CAG ACA TAT CAT AAG CTA CCT GAT AAT TAC ATT ACA AAA TCA GAA 98 
Leu Gin Thr Tyr His Lys Leu Pro Asp Asn Tyr lie Thr Lys Ser Glu 
15 20 25 30 

GCA GAA GCC CTC GGC TGG GTG GCA TCA AAA GGG AAC CTT GCA GAC GTC 146 
Ala Gin Ala Leu Gly Trp Val Ala Ser Lys Gly Asn Leu Ala Asp Val 
35 40 45 

GCT CCG GGG AAA AGC ATC GGC GGA GAC ATC TTC TCA AAC AGG GAA GGC 194 
Ala Pro Gly Lys Ser lie Gly Gly Asp lie Phe Ser Asn Arg Glu Gly 
50 55 60 

AAA CTC CCG GGC AAA AGC GGA CGA ACA TGG CGT GAA GCG GAT ATT AAC 242 
Lys Leu Pro Gly Lys Ser Gly Arg Thr Trp Arg Glu Ala Asp lie Asn 
65 70 75 

TAT ACA TCA GGC TTC AGA AAT TCA GAC CGG ATT CTT TAC TCA AGC GAC 290 
Tyr Thr Ser Gly Phe Arg Asn Ser Asp Arg lie Leu Tyr Ser Ser Asp 
80 85 90 

TGG CTG ATT TAC AAA ACA ACG GAC CAT TAT CAG ACC TTT ACA AAA ATC 338 
Trp Leu lie Tyr Lys Thr Thr Asp His Tyr Gin Thr Phe Thr Lys lie 
95 100 105 110 

AGA TAA 344 
Arg 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met Ala Gin Val lie Asn Thr Phe Asp Gly Val Ala Asp Tyr Leu Gin 
15 10 15 

Thr Tyr His Lys Leu Pro Asp Asn Tyr lie Thr Lys Ser Glu Ala Gin 
20 25 30 

Ala Leu Gly Trp Val Ala Ser Lys Gly Asn Leu Ala Asp Val Ala Pro 
35 40 45 

Gly Lys Ser lie Gly Gly Asp lie Phe Ser Asn Arg Glu Gly Lys Leu 
50 55 60 

Pro Gly Lys Ser Gly Arg Thr Trp Arg Glu Ala Asp lie Asn Tyr Thr 
65 70 75 80 

Ser Gly Phe Arg Asn Ser Asp Arg He Leu Tyr Ser Ser Asp Trp Leu 
85 90 95 

He Tyr Lys Thr Thr Asp His Tyr Gin Thr Phe Thr Lys lie Arg 
100 105 110 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_f eature 

(B) LOCATION :1. .18 

(D) OTHER INFORMATION :/note= "Figure 1A: Bl primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3:: 
CATGGTCTAG AGTACTTG 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME / KEY : misc_f eature 

(B) LOCATION: 1 * . 16 

(D) OTHER INFORMATION:/ no te= "Figure 1A: B4 primer" 



r 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CCAGCCGAGG GCTTGT 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_f eature 

(B) LOCATION: 1. .16 

(D) OTHER INFORMATION : / note= "Figure 1A: B2 primer" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GCATCAAAAG GGAACC 

16 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 223 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/ KEY : misc_feature 
<B) LOCATION: 1. .228 

(D) OTHER INFORMATION: /no te= "Figure IB: Intergenic 
Sequence" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CGAAAAAAAC GGCTTCCTGC GGAGGCCGTT TTTTTCAGCT TTACATAAAG TGTGTAATAA 60 
ATTTTTCTTC AAACTCTGAT CGGTCAATTT CACTTTCCGG ATCCGGTCCA ATCTGCAGCC 120 
GTCCGAGACA GGAGGACATC GTCCAGCTGA AACCGGGGCA GAATCCGGCC ATTTCTGAAG 180 
AGAAAAATGG TAAACTGATA GAATAAAATC ATAAGAAAGG AGCCGCAC 228 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 323 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



f 



(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE : 

(A) NAME /KEY: CDS 

(B) LOCATION: 1. .273 

(D) OTHER INFORMATION :/product= "Barstar" 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATG AAA AAA GCA GTC ATT AAC GGG GAA CAA ATC AGA AGT ATC AGC GAC 48 
Met Lys Lys Ala Val lie Asn Gly Glu Gin lie Arg Ser He Ser Asp 
115 120 125 

CTC CAC CAG ACA TTG AAA AAG GAG CTT GCC CTT CCG GAA TAC TAC GGT 96 
Leu His Gin Thr Leu Lys Lys Glu Leu Ala Leu Pro Glu Tyr Tyr Gly 
130 135 140 

GAA AAC CTG GAC GCT TTA TGG GAT TGT CTG ACC GGA TGG GTG GAG TAC 144 
Glu Asn Leu Asp Ala Leu Trp Asp Cys Leu Thr Gly Trp Val Glu Tyr 
145 150 155 160 

CCG CTC GTT TTG GAA TGG AGG CAG TTT GAA CAA AGC AAG CAG CTG ACT 192 
Pro Leu Val Leu Glu Trp Arg Gin Phe Glu Gin Ser Lys Gin Leu Thr 
165 170 175 

GAA AAT GGC GCC GAG AGT GTG CTT CAG GTT TTC CGT GAA GCG AAA GCG 240 
Glu Asn Gly Ala Glu Ser Val Leu Gin Val Phe Arg Glu Ala Lys Ala 
180 185 190 

GAA GGC TGC GAC ATC ACC ATC ATA CTT TCT TAA TACGATCAAT GGGAGATGAA 293 
Glu Gly Cys Asp He Thr He He Leu Ser 
195 200 

CAATATAGAT CCCCCGGGCT GCAGGAATTC 323 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Lys Lys Ala Val He Asn Gly Glu Gin He Arg Ser He Ser Asp 
15 10 15 

Leu His Gin Thr Leu Lys Lys Glu Leu Ala Leu Pro Glu Tyr Tyr Gly 
20 25 30 

Glu Asn Leu Asp Ala Leu Trp Asp Cys Leu Thr Gly Trp Val Glu Tyr 
35 40 45 



Pro Leu Val Leu Glu Trp Arg Gin Phe Glu Gin Ser Lys Gin Leu Thr 
50 55 60 



( 



Glu Asn Gly Ala Glu Ser Val Leu Gin Val Phe Arg Glu Ala Lys Ala 
65 70 75 80 

Glu Gly Cys Asp lie Thr lie lie Leu Ser 
85 ' 90 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1 . . 21 

(D) OTHER INFORMATION : /note* "Figure 1C: B3 primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TAATACGATC AAT GGGAGAT G 21 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: CDS 

(B) LOCATION : 9 . .194 

(D) OTHER INFORMATION:/ not e= "Figure ID' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TCTAGACC ATG GCA CAG GTT ATC AAC ACG TTT GAC GGG GTT GCG GAT TAT 50 
Met Ala Gin Val lie Asn Thr Phe Asp Gly Val Ala Asp Tyr 
95 100 105 

CTT CAG ACA TAT CAT AAG CTA CCT GAT AAT TAC ATT ACA AAA TCA GAA 98 
Leu Gin Thr Tyr His Lys Leu Pro Asp Asn Tyr lie Thr Lys Ser Glu 
110 115 120 

GCA CAA GCC CTC GGC TGG ATG GGC GGT GGC GGT TCC GGT GGC GGT GGC 146 
Ala Gin Ala Leu Gly Trp Met Gly Gly Gly Gly Ser Gly Gly Gly Gly 
125 130 135 



r 



AGC GGC GGC GGT GGT AGC GGG ATC CCC GGG TAC GGT CAG TCC CTT ATG 194 
Ser Gly Gly Gly Gly Ser Gly lie Pro Gly Tyr Gly Gin Ser Leu Met 
140 145 150 



(2) INFORMATION FOR SSQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
{A} LENGTH: 62 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ala Gin Val lie Asn Thr Phe Asp Gly Val Ala Asp Tyr Leu Gin 
15 10 15 

Thr Tyr His Lys Leu Pro Asp Asn Tyr lie Thr Lys Ser Glu Ala Gin 
20 25 30 

Ala Leu Gly Trp Met Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly 
35 40 45 

Gly Gly Gly Ser Gly lie Pro Gly Tyr Gly Gin Ser Leu Met 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 526 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION:!. .526 

(D) OTHER INFORMATION :/note= "Figure IE" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TCTAGACCAT GCAGATCTTC GTGAAAACCT TGACCGGCAA GACCATCACT CTCGAGGTCG 60 

AGAGCAGCGA CACCATCGAC AATGTCAAGG CCAAGATCCA AGACAAAGAA GGTATCATTC 120 

TTCCTCACTC AATCTGGATT CTTCTCTTTA GCTTTTTGAA ATTCAGATCT CTTATCATTT 180 

ACTTGTTTCT CCTTTAAGGA ATCCCTCCGG AT CAGCAGAG ATTGATCTTC GCCGGAAAGC 240 

AGCTCGAAGA TGGCCGTACT TTGGCTGACT ACAACAT CCA GAAAGGTACG AAATCATCCG 300 

AATCCTTCTG TTGATCATTT CGATGATCTG ATTGTATAAA CTCTAATGGA TTGTTATCAT 360 

TTGTAAACAG AATCTACACT TCATCTTGTG TTGAGGCTTA GAGGTGGAGC ACAGGTTATC 420 

AACACGTTTG ACGGGGTTGC GGATTATCTT CAGACAT AT C ATAAGCTACC TGATAATTAC 480 



ATTACAAAAT CAGAAGCACA AGCCCTCGGC TGGATGTAGA GGATCC 



526 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 631 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_f eature 

(B) LOCATION:!. .631 

(D) OTHER INFORMATION : / note= "Figure IF" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TCTAGACCAT GCAGATCTTC GTGAAAACCT TGACCGGCAA GACCATCACT CTCGAGGTCG 60 

AGAGCAGCGA C CAT CGACAA TGTCAAGGCC AAGAT CCAAG ACAAAGAAGG TATCATTCTT 120 

CCTCACTCAA TCTGGATTCT TCTCTTTAGC TTTTTGAAAT TCAGATCTCT TATCATTTAC 180 

TTGTTTCTCC TTTAAGGAAT CCCTCCGGAT CAGCAGAGAT TGATCTTCGC CGGAAAGCAG 24 0 

CTCGAAGATG GCCGTACTTT GGCTGACTAC AACATCCAGA AAGGTACGAA ATCATCCGAA 300 

TCCTTCTGTT GATCATTTCG ATGATCTGAT TGTATAAACT CTAATGGATT GTTATCATTT 360 

GTAAACAGAA TCTACACTTC ATCTTGTGTT GAGGCTTAGA GGTGGAGCAT CAAAAGGGAA 420 

CCTTGCAGAC GTCGCTCCGG GGAAAAGCAT CGGCGGAGAC ATCTTCTCAA ACAGGGAAGG 480 

CAAACTCCCG GGCAAAAGCG GACGAACATG GCGTGAAGCG GATATTAACT AT ACAT CAGG 540 

CTTCAGAAAT TCAGACCGGA TTCTTTACTC AAGCGACTGG CTGATTTACA AAACAACGGA 600 



CCATTATCAG ACCTTTACAA AAAT CAGAT A A 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS; 
(A) LENGTH: 20 base pairs 
{B} TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



631 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1 . . 20 

(D) OTHER INFORMATION: /note* "Figure 1G: B5" 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 14:: 
CACAAGTACT CTAGACCATG 

(2) INFORMATION FOR SEQ ID NO: 15: ' 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME / KEY : misc_feature 

(B) LOCATION: 1. .19 

(D) OTHER INFORMATION: /no te= "Figure 1G 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CATCCAGCCG AGGGCTTGT 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY : misc_f eature 

(B) LOCATION : I . .16 

(D) OTHER INFORMATION : / note— "Figure 1G 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GGCGGTGGCG GTTCCG 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1 . . 23 

(D) OTHER INFORMATION : / note= "Figure 1G 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CCACTAGTTC TAGAGTACTT GTG 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY : mis cofeature 

(B) LOCATION : 1 • . 18 

(D) OTHER INFORMATION :/note= "Figure 1G 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:: 
GCACAGGTTA TCAACACG 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_f eature 

(B) LOCATION : 1 . .31 

(D) OTHER INFORMATION :/note= "Figure 1G 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GCGGATCCTC TACATCCAGC CGAGGGCTTG T 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: misc feature 



(B) LOCATION : 1 . .16 - 

(D) OTHER INFORMATION: /note= "Figure 1G: Bll" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GCATCAAAAG GGAACC 16 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{ii} MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY : mis cofeature 

(B) LOCATION : 1 . .17 

(D) OTHER INFORMATION: /note 55 "Figure 1G: B12" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGTCTAGAGT ACTTGTG 17 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION : 1 . .30 

(D) OTHER INFORMATION :/note= "Figure 1G: Ubql6F" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

GCTCTAGACC ATGCAGATCT TCGTGAAAAC 30 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION : 1 . .25 

(D) OTHER INFORMATION:/ no te= "Figure 1G: UbqlR" 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CTGGATCCAC CTCTAAGCCT CAACA 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION : 1 . .24 

(D) OTHER INFORMATION: /note= "Figure 1G: Ubqla" 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 24: 
TATGGATCCC CCGGGCTGCA GGAA 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY : misc_feature 

(B) LOCATION:!. .21 

(D) OTHER INFORMATION : / note- "Figure 1G: Ubqib" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
TCCACCTCTA AGCCTCAACA C 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION : 1 . - 23 

(D) OTHER INFORMATION: /no te= "Fig 3A: lane 1, Primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GCGGATCCAT GAAGGAGACC GCC 
(2) INFORMATION FOR SEQ ID NO: 27: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 1. .56 

(D) OTHER INFORMATION: /note- "Fig 3A; lane 2, RNI" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GCGGATCCAT GAAGGAGACC GCCGCCGCCA AGTTCGAGCG CCAGCACATG GACAGC 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1 . . 22 

(D) OTHER INFORMATION: /no te= "Fig 3A: lane 3, RNI" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CATAGATCTT TAGCTGTCCA TG 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1 . .28 

(D) OTHER INFORMATION :/note= "Fig 3A: lane 4, Primer 
RN-d" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CCAGATCTAT GAGCTCCTCC AACTACTG 
(2) INFORMATION FOR SEQ ID NO: 30; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1 . • 63 

(D) OTHER INFORMATION: /note= "Fig 3A: lanes 2/5, RNII" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TAAAGATCTA TGAGCACCTC CGCCGCCAGC TCCTCCAACT ACT GCAACCA GATGATGAAG 
TCT 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1. .21 

(D) OTHER INFORMATION :/note« "Fig 3A: lane 6, RN2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TCAGGTTCCT AGACTTCATC A 
(2) INFORMATION FOR SEQ ID NO: 32: 



(i) SEQUENCE CHARACTERISTICS-: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE : 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1, . 59 

(D) OTHER INFORMATION : / note= "Fig 3 A, lanes 5/7, RNIII" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
AGGAACCTGA CCAAGGACAG GTGCAAGCCA GTCAACACCT TCGTCCACGA GAGCCTGGC 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION : 1 , .19 

[V\ OTHER INFORMATION : / note- "Fig 3A: lane 8, RN3" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CTGGAC AT CG GCCAGGCTC 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1. .48 

(D) OTHER INFORMATION : / note= "Fig 3A, lanes 7/9, RN IV" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CGATGTCCAG GCCGTCTGCA GCCAGAAGAA CGTGGCCTGC AAGAACGG 



48 



(2) INFORMATION FOR SEQ ID NO: 35: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



<ix) FEATURE: 

(A) NAME /KEY: mis cofeature 
<B) LOCATION : 1 . ,21 

(D) OTHER INFORMATION : / note= "Fig 3A: lane 10, RN 4" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
AGTTGGTCTG ACCGTTCTTG C 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1 . . 60 

(D) OTHER INFORMATION: /note* "Fig 3A: lanes 9/11, RN V" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TCAGACCAAC TGCTACCAGT CCTACAGCAC CATGTCCATC ACCGACTGCC GCGAGACCGG 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION : 1 . .19 

(D) OTHER INFORMATION: /note- "Fig 3A: lane 12, RN5" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



( 



(" 



CTTGCTGGAG CCGGTCTCG 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDE0NESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE : 

(A) NAME/ KEY : misc_f eature 

(B) LOCATION : 1 . .55 

(D) OTHER INFORMATION: /note= "Fig 3A: lanes 11/13, RN VI" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CTCCAGCAAG TACCCTAACT GCGCCTACAA GACCACCCAG GCCAACAAGC ACATC 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1 . . 21 

(D) OTHER INFORMATION: /no te= "Fig 3A: lane 14, RN 6" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CAGGCAACAA TGATGTGCTT G 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1. .24 

(D) OTHER INFORMATION : /note- "Fig 3A: lane 15, Primer 
RN-b" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CGGGATCCTT TAGACGGAGG CGTC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_feature 

(B) LOCATION : 1 . .66 

(D) OTHER INFORMATION: /note= "Fig 3A: lanes 13/16, RN 
VII" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
ATTGTTGCCT GCGAGGGTAA CCCTTACGTG CCTGTCCACT TCGACGCCTC CGTCTAAAGG 
ATCCCG 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 23 base pairs 
£B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 1. .23 

(D) OTHER INFORMATION: /no te= "Fig 3B: lane 1, PCR Primer 
RNa" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GCGGATCCAT GAAGGAGACC GCC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION: 1 . . 56 

(D) OTHER INFORMATION:/ no te= "Fig 3B, lane 2, RN I" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GCGGAT CCAT GAAGGAGACC GCCGCCGCCA AGTTCGAGCG CCAGCACATG GACAGC 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1. .18 

(D) OTHER INFORMATION : /note= "Fig 3B, lane 3, RN 7" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CCACCGCCGC TGTCCATG 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_feature 

(B) LOCATION: 1. . 57 

(D) OTHER INFORMATION : /note= "Fig 3B: lanes 2/4, RN VIII" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GGCGGTGGCG GTTCCGGTGG CGGTGGCAGC GGCGGCGGTG GTAGCAAGAT CTTCGGG 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



r 



MOLECULE TYPE: cDNA 



(ixl FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION : 1 . .18 

ft)) OTHER INFORMATION: /no te= "Fia 3B: lane 5, RN c" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
CCCGAAGATC TTGCTACC 18 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1 . ♦ 45 

(D) OTHER INFORMATION :/note= "Ficr 4, lane 1" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
AAAGAGACAG CAGCCGCAAA GTTTGAGCGT CAGCATATGG ATAGT 45 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1, ,32 

(D) OTHER INFORMATION: /note- "Fig 4A: lane 2" 

(ix) FEATURE: 

(A) NAME/ KEY : Modif ied-site 

(B) LOCATION: 17 

(D) OTHER INFORMATION: /note- ,,rt Xaa M = corresponds to an 
ochre stop codon (UAA) " 



( 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Met Lys Giu Thr Ala Ala Ala Lys Phe Giu Arg Gin His Met Asp Se 
1 5-10 15 

Xaa Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Se 
20 25 30 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE : 

(A) NAME /KEY: misc_feature 

(B) LOCATION: 1. .63 

{D} OTHER INFORMATION :/note= "Fig 4A: lane 3" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GGATCCATGA AGGAGACCGC CGCCGCCAAG TTCGAGCGCC AGCACATGGA CAGCTAAAGA 
TCT 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 106 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1. .106 

(D) OTHER INFORMATION: /note= "Fig 4A: lane 4" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GGATCCATGA AGGAGACCGC CGCCGCCAAG TTCGAGCGCC AGCACATGGA CAGCGGCGGT 
GGCGGTTCCG GTGGCGGTGG CAGCGGCGGC GGTGGTAGCA AGATCT 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 330 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: misc_feature 

(B) LOCATION : 1 . .330 

(D) OTHER INFORMATION: /note= "Fig 4B: lane 1" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

AGCACCAGTG CTGCCAGTTC TTCCAACTAC TGTAACCAGA TGATGAAGTC TAGAAACTTG 60 

ACCAAGGACA GATGTAAGCC AGTTAACACA TTTGTCCACG AGAGTTTGGC TGATGTCCAA 12 0 

GCCGTCTGCA GT C AGAAAAA CGTTGCATGC AAGAACGGTC AAACGAACTG TTACCAGAGT 180 

TACAGCACCA TGTCCATCAC TGACTGTCGT GAGACAGGCT CGAGCAAGTA TCCTAATTGT 240 

GCTTACAAGA CCACACAGGC GAACAAACAC ATCATTGTTG CTTGTGAAGG TAACCCTTAC 300 

GTTCCTGTCC ACTTTGACGC CAGTGTTTAA 330 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1 . . 132 

(D) OTHER INFORMATION: /note= "Fig 4B: lane 2" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Met Ser Thr Ser Ala Ala Ser Ser Ser Asn Tyr Cys Asn Gin Met Met 
15 10 15 

Lys Ser Arg Asn Leu Thr Lys Asp Arg Cys Lys Pro Val Asn Thr Phe 
20 25 30 

Val His Glu Ser Leu Ala Asp Val Gin Ala Val Cys Ser Gin Lys Asn 
35 40 45 

Val Ala Cys Lys Asn Gly Gin Thr Asn Cys Tyr Gin Ser Tyr Ser Thr 
50 55 60 

Met Ser lie Thr Asp Cys Arg Glu Thr Gly Ser Ser Lys Tyr Pro Asn 
65 70 75 80 

Cys Ala Tyr Lys Thr Thr Gin Ala Asn Thr Asp Cys Arg Glu Thr Gly 
85 90 95 



Ser Ser Lys Tyr Pro Asn Cys Ala Tyr Lys Thr Thr Gin Ala Asn Lys 
100 105 110 



His lie lie Val Ala Cys Glu Gly Asn Pro Tyr Val Pro Val His Phe 
115 120 125 

Asp Ala Ser Val 
130 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 346 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE : 

(A) NAME /KEY: misc_f eature 

(B) LOCATION:!. .330 

(D) OTHER INFORMATION: /no te= "Fig 4B, lane 3" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

AGATCTATGA GCACCTCCGC CGCCAGCTCC TCCAACTACT GCAACCAGAT GATGAAGTCT 60 

AGGAACCTGA CCAAGGACAG GTGCAAGCCA GTCAACACCT TCGTCCACGA GAGCCTGGCC 120 

GAT GTCCAGG CCGTCTGCAG CCAGAAGAAC GTGGCCTGCA AGAACGGTCA GACCAACTGC 180 

TACCAGTCCT ACAGCACCAT GTCCATCACC GACTGCCGCG AGACCGGCTC CAGCAAGTAC 240 

CCTAACTGCG CCTACAAGAC CACCCAGGCC AACAAGCACA TCATTGTTGC CTGCGAGGGT 300 

AACCCTTACG TGCCTGTCCA CTTCGACGCC TCCGTCTAAA GGATCC 346 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 331 base pairs 

(B) TYPE: nucleic acid 
{C} STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/ KEY: mis cofeature 

(B) LOCATION: 1. .331 

(D) OTHER INFORMATION : /note— "Fig 4B, lane 4" 



(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 54:: 
AGATCTATGA GCTCCTCCAA CT ACT GCAAC CAGATGATGA AGTCTAGGAA CCT GACCAAG 60 



( 



GACAGGT GCA AGCCAGTCAA CACCTCCGTC CACGAGAGCC TGGCCGATGT CCAGGCCGTC 



120 



TGCAGCCAGA AGAACGT GGC CTGCAAGAAC GGTCAGACCA ACTGCTACCA GTCCTACAGC 



180 



ACCATGTCCA TCACCGACTG CCGCGAGACC GGCTCCAGCA AGTACCCTAA CTGCGCCTAC 



240 



AAGACCACAC AGGCCAACAA GCACATCATT GTTGCCTGCG AGGGTAACCC TTACGTGCCT 



300 



GTCCACTTCG ACGCCTCCGT CTAAAGGATC C 



331 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 163 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY: misc_feature 

(B) LOCATION: 1 . .163 

(D) OTHER INFORMATION :/note= "Fig 4C i" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
TCTAGATCTT AACATGAAGA ATGTTTTAGT AAGGT CAGCT GCGCGAGCTC TGCTTGGCGG 60 
CGGTGGGCGG AGCTACTACC GCCAGCTCTC AACGGCGGCG ATCGTGGAAC AGAGACACCA 120 
GCACGGTGGC GGCGCGTTTG GAAGCTTCCA CTTAAGCGGA TCC 163 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 198 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY: misc_f eature 

(B) LOCATION: 1 • * 198 

(D) OTHER INFORMATION :/note= "Fig 4C ii" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
ATGAAGAATG TTTTAGTAAG GTCAGCTGCG CGAGCTCTGC TTGGCGGCGG TGGGCGGAGC 60 
TACTACCGCC AGCTCTCAAC GGCGGCGATC GTGGAACAGA GACACCAGCA CGGTGGCGGC 120 



GCGTTTGGAA GCTTCCACTT AAGAAGGATG AAGGAGACCG CCGCCGCCAA GTTCGAGCGC 



180 



CAGCACATGG ACAGCTAA 



198 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: ' 

(A) LENGTH: 270 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY : misc_f eature 

(B) LOCATION: 1 . .270 

(D) OTHER INFORMATION : / note— "Fig 4c iii" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
AT GAAG AAT G TTTTAGTAAG GTCAGCTGCG CGAGCTCTGC TTGGCGGCGG TGGGCGGAGC 
TACTACCGCC AGCTCTCAAC GGCGGCGATC GTGGAACAGA GACACCAGCA CGGTGGCGGC 
GCGTTTGGAA GCTTCCACTT AAGAAGGATG AAGGAGACCG CCGCCGCCAA GTTCGAGCGC 
CAGCACATGG ACAGCGGCGG TGGCGGTTCC GGTGGCGGTG GCAGCGGCGG CGGTGGTAGC 
GGGATCCCCG GGTACGGTCA GTCCCTTATG 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 465 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY : mis c_f eature 

(B) LOCATION:!. .465 

(D) OTHER INFORMATION : / note= "Fig 4C iv" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 



AT GAAGAAT G 


TTTTAGTAAG 


GTCAGCTGCG 


CGAGCTCTGC 


TTGGCGGCGG 


TGGGCGGAGC 


60 


TACTACCGCC 


AGCTCTCAAC 


GGCGGCGATC 


GTGGAACAGA 


GACACCAGCA 


CGGTGGCGGC - 


120 


GCGTTTGGAA 


GCTTCCACTT 


AAGAAGGATG 


AGCTCCTCCA ACTACT GCAA 


CCAGATGATG 


180 


AAGTCTAGGA ACCTGACCAA 


GGACAGGTGC 


AAGCCAGTCA 


ACACCTCCGT 


CCACGAGAGC 


240 


CTGGCCGATG 


TCCAGGCCGT 


CTGCAGCCAG 


AAGAACGTGG 


CCTGCAAGAA 


CGGTCAGACC 


300 


AACT GCTACC 


AGT CCTACAG 


CACCATGTCC 


ATCACCGACT 


GCCGCGAGAC 


CGGCTCCAGC 


360 



f 



AAGTACCCTA ACTGCGCCTA CAAGACCACA CAGGCCAACA AGCACATCAT TGTTGCCTGC 420 

GAGGGTAACC CTTACGTGCC TGTCCACTTC . GACGCCTCCG TCTAA 465 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 715 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY: mis cofeature 

(B) LOCATION : 1 . .715 

(D) OTHER INFORMATION: /note= "Fig 4C v" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

ATGCAGATCT TCGTGAAAAC CTTGACCGGC AAGACCAT C A CTCTCGAGGT CGAGAGCAGC 60 

GACACCATCG ACAATGTCAA GGCCAAGATC CAAGACAAAG AAGGTATCAT TCTTCCTCAC 120 

TCAATCTGGA TTCTTCTCTT TAGCTTTTTG AAATT CAGAT CTCTTATCAT TTACTTGTTT 180 

CTCCTTTAAG GAATCCCTCC GGAT CAGCAG AGATTGATCT TCGCCGGAAA GCAGCTCGAA 240 

GAT GGCCGT A CTTTGGCTGA CTACAACATC CAGAAAGGTA CGAAATCATC CGAATCCTTC 300 

TGTTGATCAT TTCGATGATC TGATTGTATA AACTCTAATG GATTGTTATC ATTTGTAAAC 360 

AGAATCTACA CTTCATCTTG TGTTGAGGCT TAGAGGTGGA TCCAGCTCCA ACTACTGCAA 420 

CCAGATGATG AAGTCTAGGA ACCTGACCAA GGACAGGTGC AAGCCAGTCA ACACCTCCGT 480 

CCACGAGAGC CTGGCCGATG TCCAGGCCGT CTGCAGCCAG AAGAACGTGG CCTGCAAGAA. 540 

CGGTCAGACC AACTGCTACC AGTCCTACAG CACCATGTCC ATCACCGACT GCCGCGAGAC 600 

CGGCTCCAGC AAGTACCCTA ACTGCGCCTA CAAGACCACA CAGGCCAACA AGCACATCAT 660 

TGTTGCCTGC GAGGGTAACC CTTACGTGCC TGTCCACTTC GACGCCTCCG TCTAA 715 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



{ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GGTGGATCCA GCTCCAACTA CTGCAAC 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CGGGATCCTT AGACGGAGGC GTCG 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTCCTTAAGA AGGAT GAGCT CCTCCAACTA C 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CGGGATCCTT AGACGGAGGC GTCG 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

GTCCTTAAGA AGGATGAAGG AGACCGCCG 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) ST HANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TCGGGATCCT TAGCTGTCCA TGTGCTG 
(2) INFORMATION FOR SEQ ID NO: €6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TCGGGATCCT CATTGTTTGC CTCCCTG 
(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TGCTCTAGAT CTTAACATGA AGAAT GTTTT AG 



(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
TCGGATCCGC TTAAGTGGAA GCTTCCAAAC 
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